Python:从JSON数据创建3D网格

huangapple go评论95阅读模式
英文:

Python: 3D grid from JSON data

问题

我已经 struggiling 了相当一段时间与 Python 相关的问题(或挑战)。现在我相信我应该从你们这里得到一些帮助或提示。

在附加文件中,你会找到一个 JSON 文件对象,其中包含我正在处理的数据。我使用以下代码行加载数据:

  1. with open('json_data.json', 'r') as openfile:
  2. json_object = json.load(openfile)

现在,当查看 json_object 时,你会注意到许多相关信息,例如 2D 维度中的单元格数量 NumxNumy。现在,我感兴趣的关键/参数是 data 键,即 json_object['data']。这个参数是一个列表项,包含 Numx 乘以 Numy 个字典实例,每个实例都包含 X 和 Y 坐标。每个实例还包含第三维,时间维度在键 Data 中,它是一个列表项,包含 n 个时间步的值。可能很明显,但我觉得我应该提到时间戳对于每个坐标实例都是相同的。

因此,总结一下,对于每个坐标,都有一个包含值的时间序列,我想将其转换为使用 numpy 的 3D 维度网格。我应该如何做?

我尝试将 data 键转换为 Pandas DataFrame:

  1. df = pd.json_normalize(json_object["data"], record_path=["Data"], meta=["X","Y"])

这给我一个包含与给定时间步长和坐标对应的值的 DataFrame。但然后我不知道如何继续 - 如何将其转换为 3D 网格?

然后我尝试循环每个时间戳,这样我就会得到 n 个时间步的 2D 网格。但然后我不知道如何实现时间以使其成为 3 维的。

  1. timeStamps = [t['Time'] for t in json_object['data'][0]['Data']]
  2. dfTimestamps = {}
  3. for i, ts in enumerate(timeStamps):
  4. dfTimestamps[ts] = {}
  5. X = []
  6. Y = []
  7. vals = []
  8. for d in json_object['data']:
  9. X.append(d['X'])
  10. Y.append(d['Y'])
  11. vals.append(d['Data'][i]['Value'])
  12. dfTimestamps[ts]['X'] = X
  13. dfTimestamps[ts]['Y'] = Y
  14. dfTimestamps[ts]['Value'] = vals

编辑:
我将尝试从 JSON 文件对象中写一些示例数据。

  1. {
  2. 'info': {
  3. 'Parameters': None,
  4. 'Unit': 'mm/hour',
  5. 'Location': 'Input geojson',
  6. 'Point': {
  7. 'IdPoints': 0,
  8. 'Name': None,
  9. 'Description': None,
  10. 'X': 571125,
  11. 'Y': 6225625,
  12. 'EPSG': None,
  13. 'Latitude': 0,
  14. 'Longitude': 0,
  15. 'CreatedDatetime': '0001-01-01T00:00:00',
  16. 'OrganizationId': None,
  17. 'ResponsibleUserId': None,
  18. 'Id': 0
  19. },
  20. 'PointId': None,
  21. 'ParameterId': 'Rainintensity, id: 204',
  22. 'Timezone': 'UTC',
  23. 'DataSource': 'X-band Sabro, RadarId: {303}',
  24. 'EPSG': '32632',
  25. 'CreatedDateTime': '2023-07-13T13:55:18.9099301Z',
  26. 'AllPoints': False,
  27. 'dxdy': 250,
  28. 'Numx': 7,
  29. 'Numy': 6,
  30. 'X0': 0,
  31. 'Y0': 0,
  32. 'MissingSteps': 0,
  33. 'ProcessQuality': [
  34. {
  35. 'qualityIndex': 0,
  36. 'qualityDescription': 'No problem detected',
  37. 'qualitySteps': 3241,
  38. 'missingSteps': None,
  39. 'fromUTC': '2023-07-10T00:00:00',
  40. 'toUTC': '2023-07-12T06:00:00'
  41. }
  42. ]
  43. },
  44. 'data': [
  45. {
  46. 'X': 572125,
  47. 'Y': 6226875,
  48. 'Data': [
  49. {'Time': '2023-07-10T00:00:00', 'Value': 0},
  50. {'Time': '2023-07-10T00:01:00', 'Value': 0},
  51. {'Time': '2023-07-10T00:02:00', 'Value': 0.259},
  52. {'Time': '2023-07-10T00:03:00', 'Value': 0},
  53. {'Time': '2023-07-10T00:04:00', 'Value': 0},
  54. {'Time': '2023-07-10T00:05:00', 'Value': 0.321},
  55. {'Time': '2023-07-10T00:06:00', 'Value': 0.279},
  56. {'Time': '2023-07-10T00:07:00', 'Value': 0},
  57. {'Time': '2023-07-10T00:08:00', 'Value': 0.371},
  58. {'Time': '2023-07-10T00:09:00', 'Value': 0.399},
  59. {'Time': '2023-07-10T00:10:00', 'Value': 0.345}
  60. # ...
  61. {'Time': '2023-07-10T16:37:00', 'Value': 0.299},
  62. {'Time': '2023-07-10T16:38:00', 'Value': 0},
  63. {'Time': '2023-07-10T16:39:00', 'Value': 0}
  64. # ...
  65. ]
  66. },
  67. {
  68. 'X': 572125,
  69. 'Y': 6226875,
  70. 'Data': [
  71. {'Time': '2023-07-10T00:00:00', 'Value': 0},
  72. {'Time': '2023-07-10T00:01:00', 'Value': 0},
  73. {'Time': '2023-07-10T00:02:00', 'Value': 0},
  74. {'Time': '2023-07-10T00:03:00', 'Value':
  75. <details>
  76. <summary>英文:</summary>
  77. I have been struggling with a Python related problem (or challenge) for quite some time now. And now I believe I should get some help or hints from you folks.
  78. In the [attached file](https://aarhusvand-my.sharepoint.com/:u:/g/personal/anl_aarhusvand_dk/ESEC8xbP2OJCpLobZTRcGt8B09b_GpHzxcRJU2na1rAaJg?e=CVa0Tt) you will find a JSON file object with the data I&#39;m working with. I&#39;m loading in the data using the following lines of code:
  79. ```python
  80. with open(&#39;json_data.json&#39;,&#39;r&#39;) as openfile:
  81. json_object = json.load(openfile)

Now, when looking into json_object, you will notice a lot of relevant information, such as the number of cells in a 2D dimension, Numx and Numy. Now, the key/parameter I am interested in is the data key, i.e. json_object[&#39;data&#39;]. This parameter, which is a list item, holds Numx by Numy dictionary instances, holding X and Y coordinates for each. Each instance also holds the 3rd dimension, the time dimension in the key Data, which is a list item, holding n timesteps of values. It may be obvious, but I feel I should mention that the timestamps are the same for each coordinate instance.

So, to summarize, for each coordinate, there is a timeseries with values, which I would like to convert into a 3D dimensional grid using numpy. How would I do that?

I've tried to convert the data key into a Pandas DataFrame with this:

  1. df = pd.json_normalize(json_object[&quot;data&quot;], record_path =[&quot;Data&quot;],meta=[&quot;X&quot;,&quot;Y&quot;])

This gives me a DataFrame with values corresponding to the given timestep and coordinate.
But then I don't know how to continue - how do I turn this into a 3D grid?

Then I tried to loop over each timestamp, so I would have n timesteps of 2D grids. But then I struggle how to implement the time in order to make it 3-dimensional.

  1. timeStamps = [t[&#39;Time&#39;] for t in data_json[&#39;data&#39;][0][&#39;Data&#39;]]
  2. dfTimestamps = {}
  3. for i,ts in enumerate(timeStamps):
  4. dfTimestamps[ts] = {}
  5. X = []
  6. Y = []
  7. vals = []
  8. for d in data_json[&#39;data&#39;]:
  9. X.append(d[&#39;X&#39;])
  10. Y.append(d[&#39;Y&#39;])
  11. vals.append(d[&#39;Data&#39;][i][&#39;Value&#39;])
  12. dfTimestamps[ts][&#39;X&#39;] = X
  13. dfTimestamps[ts][&#39;Y&#39;] = Y
  14. dfTimestamps[ts][&#39;Value&#39;] = vals

EDIT:
I will try to write som example data from the JSON fileobject below.

  1. {&#39;info&#39;: {&#39;Parameters&#39;: None,
  2. &#39;Unit&#39;: &#39;mm/hour&#39;,
  3. &#39;Location&#39;: &#39;Input geojson&#39;,
  4. &#39;Point&#39;: {&#39;IdPoints&#39;: 0,
  5. &#39;Name&#39;: None,
  6. &#39;Description&#39;: None,
  7. &#39;X&#39;: 571125,
  8. &#39;Y&#39;: 6225625,
  9. &#39;EPSG&#39;: None,
  10. &#39;Latitude&#39;: 0,
  11. &#39;Longitude&#39;: 0,
  12. &#39;CreatedDatetime&#39;: &#39;0001-01-01T00:00:00&#39;,
  13. &#39;OrganizationId&#39;: None,
  14. &#39;ResponsibleUserId&#39;: None,
  15. &#39;Id&#39;: 0},
  16. &#39;PointId&#39;: None,
  17. &#39;ParameterId&#39;: &#39;Rainintensity, id: 204&#39;,
  18. &#39;Timezone&#39;: &#39;UTC&#39;,
  19. &#39;DataSource&#39;: &#39;X-band Sabro, RadarId: {303}&#39;,
  20. &#39;EPSG&#39;: &#39;32632&#39;,
  21. &#39;CreatedDateTime&#39;: &#39;2023-07-13T13:55:18.9099301Z&#39;,
  22. &#39;AllPoints&#39;: False,
  23. &#39;dxdy&#39;: 250,
  24. &#39;Numx&#39;: 7,
  25. &#39;Numy&#39;: 6,
  26. &#39;X0&#39;: 0,
  27. &#39;Y0&#39;: 0,
  28. &#39;MissingSteps&#39;: 0,
  29. &#39;ProcessQuality&#39;: [{&#39;qualityIndex&#39;: 0,
  30. &#39;qualityDescription&#39;: &#39;No problem detected&#39;,
  31. &#39;qualitySteps&#39;: 3241,
  32. &#39;missingSteps&#39;: None,
  33. &#39;fromUTC&#39;: &#39;2023-07-10T00:00:00&#39;,
  34. &#39;toUTC&#39;: &#39;2023-07-12T06:00:00&#39;}]},
  35. &#39;data&#39;: [{&#39;X&#39;: 572125,
  36. &#39;Y&#39;: 6226875,
  37. &#39;Data&#39;: [{&#39;Time&#39;: &#39;2023-07-10T00:00:00&#39;, &#39;Value&#39;: 0},
  38. {&#39;Time&#39;: &#39;2023-07-10T00:01:00&#39;, &#39;Value&#39;: 0},
  39. {&#39;Time&#39;: &#39;2023-07-10T00:02:00&#39;, &#39;Value&#39;: 0},
  40. {&#39;Time&#39;: &#39;2023-07-10T00:03:00&#39;, &#39;Value&#39;: 0},
  41. {&#39;Time&#39;: &#39;2023-07-10T00:04:00&#39;, &#39;Value&#39;: 0},
  42. {&#39;Time&#39;: &#39;2023-07-10T00:05:00&#39;, &#39;Value&#39;: 0},
  43. {&#39;Time&#39;: &#39;2023-07-10T00:06:00&#39;, &#39;Value&#39;: 0},
  44. {&#39;Time&#39;: &#39;2023-07-10T00:07:00&#39;, &#39;Value&#39;: 0},
  45. {&#39;Time&#39;: &#39;2023-07-10T00:08:00&#39;, &#39;Value&#39;: 0.399},
  46. {&#39;Time&#39;: &#39;2023-07-10T00:09:00&#39;, &#39;Value&#39;: 0},
  47. {&#39;Time&#39;: &#39;2023-07-10T00:10:00&#39;, &#39;Value&#39;: 0},
  48. {&#39;Time&#39;: &#39;2023-07-10T00:11:00&#39;, &#39;Value&#39;: 0},
  49. ...
  50. {&#39;Time&#39;: &#39;2023-07-10T16:37:00&#39;, &#39;Value&#39;: 0.299},
  51. {&#39;Time&#39;: &#39;2023-07-10T16:38:00&#39;, &#39;Value&#39;: 0},
  52. {&#39;Time&#39;: &#39;2023-07-10T16:39:00&#39;, &#39;Value&#39;: 0},
  53. ...]},
  54. {&#39;X&#39;: 572125,
  55. &#39;Y&#39;: 6226875,
  56. &#39;Data&#39;: [{&#39;Time&#39;: &#39;2023-07-10T00:00:00&#39;, &#39;Value&#39;: 0},
  57. {&#39;Time&#39;: &#39;2023-07-10T00:01:00&#39;, &#39;Value&#39;: 0},
  58. {&#39;Time&#39;: &#39;2023-07-10T00:02:00&#39;, &#39;Value&#39;: 0},
  59. {&#39;Time&#39;: &#39;2023-07-10T00:03:00&#39;, &#39;Value&#39;: 0},
  60. {&#39;Time&#39;: &#39;2023-07-10T00:04:00&#39;, &#39;Value&#39;: 0},
  61. {&#39;Time&#39;: &#39;2023-07-10T00:05:00&#39;, &#39;Value&#39;: 0},
  62. {&#39;Time&#39;: &#39;2023-07-10T00:06:00&#39;, &#39;Value&#39;: 0},
  63. {&#39;Time&#39;: &#39;2023-07-10T00:07:00&#39;, &#39;Value&#39;: 0},
  64. {&#39;Time&#39;: &#39;2023-07-10T00:08:00&#39;, &#39;Value&#39;: 0.399},
  65. {&#39;Time&#39;: &#39;2023-07-10T00:09:00&#39;, &#39;Value&#39;: 0},
  66. {&#39;Time&#39;: &#39;2023-07-10T00:10:00&#39;, &#39;Value&#39;: 0},
  67. ...
  68. {&#39;Time&#39;: &#39;2023-07-10T16:37:00&#39;, &#39;Value&#39;: 0.299},
  69. {&#39;Time&#39;: &#39;2023-07-10T16:38:00&#39;, &#39;Value&#39;: 0},
  70. {&#39;Time&#39;: &#39;2023-07-10T16:39:00&#39;, &#39;Value&#39;: 0},
  71. ...]},
  72. {&#39;X&#39;: 571125,
  73. &#39;Y&#39;: 6226125,
  74. &#39;Data&#39;: [{&#39;Time&#39;: &#39;2023-07-10T00:00:00&#39;, &#39;Value&#39;: 0},
  75. {&#39;Time&#39;: &#39;2023-07-10T00:01:00&#39;, &#39;Value&#39;: 0},
  76. {&#39;Time&#39;: &#39;2023-07-10T00:02:00&#39;, &#39;Value&#39;: 0.259},
  77. {&#39;Time&#39;: &#39;2023-07-10T00:03:00&#39;, &#39;Value&#39;: 0},
  78. {&#39;Time&#39;: &#39;2023-07-10T00:04:00&#39;, &#39;Value&#39;: 0},
  79. {&#39;Time&#39;: &#39;2023-07-10T00:05:00&#39;, &#39;Value&#39;: 0.321},
  80. {&#39;Time&#39;: &#39;2023-07-10T00:06:00&#39;, &#39;Value&#39;: 0.279},
  81. {&#39;Time&#39;: &#39;2023-07-10T00:07:00&#39;, &#39;Value&#39;: 0},
  82. {&#39;Time&#39;: &#39;2023-07-10T00:08:00&#39;, &#39;Value&#39;: 0.371},
  83. {&#39;Time&#39;: &#39;2023-07-10T00:09:00&#39;, &#39;Value&#39;: 0.399},
  84. {&#39;Time&#39;: &#39;2023-07-10T00:10:00&#39;, &#39;Value&#39;: 0.345},
  85. ...

So, I have in the beginning of my file, the info key that holds relevant information. The key data holds the information that I want to convert into a 3D grid. For each coordinate, there is time series instances as seen above, where I have included examples of 3 coordinate instances.

答案1

得分: 0

您的尝试

  1. df = pd.json_normalize(json_object[&quot;data&quot;], record_path =[&quot;Data&quot;],meta=[&quot;X&quot;,&quot;Y&quot;])

基本上是正确的。然后您提出了以下问题:

如何将此转化为三维网格?

也许您的意思是一个二维值数组,但由于许多因素,包括您的数据中有许多重复项,且数据的网格覆盖不规则,这会变得复杂:

  1. import json
  2. import pandas as pd
  3. with open('json_data.json') as file:
  4. data = json.load(file)['data']
  5. df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
  6. df['Time'] = pd.to_datetime(df.Time)
  7. df = (
  8. df.set_index(['Y', 'X'])
  9. .Value.groupby(level=['Y', 'X'])
  10. .mean()
  11. .unstack(level='X')
  12. )

如果您实际上想要三个索引(X/Y/Time),同样的问题仍然存在:

  1. import json
  2. import pandas as pd
  3. with open('json_data.json') as file:
  4. data = json.load(file)['data']
  5. df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
  6. df['Time'] = pd.to_datetime(df.Time)
  7. df = (
  8. df.set_index(['Y', 'X', 'Time'])
  9. .Value.groupby(level=['Y', 'X', 'Time'])
  10. .mean()
  11. .unstack(level='X')
  12. )

根据您的评论,您实际上想要:

  • 一个填充NaN为零的数据框;
  • 时间作为外部索引,然后是Y;
  • X作为内部(列)索引;
  • 使用first()方法消除重复项。

这意味着:

  1. import json
  2. import pandas as pd
  3. with open('json_data.json') as file:
  4. data = json.load(file)['data']
  5. df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
  6. df['Time'] = pd.to_datetime(df.Time)
  7. Z = (
  8. df.set_index(['Time', 'Y', 'X'])
  9. .Value.groupby(level=['Time', 'Y', 'X'])
  10. .first()
  11. .unstack(level='X', fill_value=0)
  12. )

不要使用循环。

英文:

Your attempt

  1. df = pd.json_normalize(json_object[&quot;data&quot;], record_path =[&quot;Data&quot;],meta=[&quot;X&quot;,&quot;Y&quot;])

is basically correct. You then ask:

> how do I turn this into a 3D grid?

Perhaps you mean a 2D array of values, but this is complicated by many factors, including that you have a lot of duplicates and there is irregular grid coverage in your data:

  1. import json
  2. import pandas as pd
  3. with open(&#39;json_data.json&#39;) as file:
  4. data = json.load(file)[&#39;data&#39;]
  5. df = pd.json_normalize(data=data, record_path=&#39;Data&#39;, meta=[&#39;X&#39;, &#39;Y&#39;])
  6. df[&#39;Time&#39;] = pd.to_datetime(df.Time)
  7. df = (
  8. df.set_index([&#39;Y&#39;, &#39;X&#39;])
  9. .Value.groupby(level=[&#39;Y&#39;, &#39;X&#39;])
  10. .mean()
  11. .unstack(level=&#39;X&#39;)
  12. )

Python:从JSON数据创建3D网格

If you actually want three indices (X/Y/Time), the same problems persist:

  1. import json
  2. import pandas as pd
  3. with open(&#39;json_data.json&#39;) as file:
  4. data = json.load(file)[&#39;data&#39;]
  5. df = pd.json_normalize(data=data, record_path=&#39;Data&#39;, meta=[&#39;X&#39;, &#39;Y&#39;])
  6. df[&#39;Time&#39;] = pd.to_datetime(df.Time)
  7. df = (
  8. df.set_index([&#39;Y&#39;, &#39;X&#39;, &#39;Time&#39;])
  9. .Value.groupby(level=[&#39;Y&#39;, &#39;X&#39;, &#39;Time&#39;])
  10. .mean()
  11. .unstack(level=&#39;X&#39;)
  12. )

Python:从JSON数据创建3D网格

Based on your comments, what you actually want is

  • a frame where NaN are filled by zero;
  • an outer index of Time, then Y;
  • an inner (column) index of X; and
  • duplicates eliminated by first().

This means

  1. import json
  2. import pandas as pd
  3. with open(&#39;json_data.json&#39;) as file:
  4. data = json.load(file)[&#39;data&#39;]
  5. df = pd.json_normalize(data=data, record_path=&#39;Data&#39;, meta=[&#39;X&#39;, &#39;Y&#39;])
  6. df[&#39;Time&#39;] = pd.to_datetime(df.Time)
  7. Z = (
  8. df.set_index([&#39;Time&#39;, &#39;Y&#39;, &#39;X&#39;])
  9. .Value.groupby(level=[&#39;Time&#39;, &#39;Y&#39;, &#39;X&#39;])
  10. .first()
  11. .unstack(level=&#39;X&#39;, fill_value=0)
  12. )

Don't loop.

huangapple
  • 本文由 发表于 2023年7月13日 22:24:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680498.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定