英文:
Python: 3D grid from JSON data
问题
我已经 struggiling 了相当一段时间与 Python 相关的问题(或挑战)。现在我相信我应该从你们这里得到一些帮助或提示。
在附加文件中,你会找到一个 JSON 文件对象,其中包含我正在处理的数据。我使用以下代码行加载数据:
with open('json_data.json', 'r') as openfile:
json_object = json.load(openfile)
现在,当查看 json_object
时,你会注意到许多相关信息,例如 2D 维度中的单元格数量 Numx
和 Numy
。现在,我感兴趣的关键/参数是 data
键,即 json_object['data']
。这个参数是一个列表项,包含 Numx
乘以 Numy
个字典实例,每个实例都包含 X 和 Y 坐标。每个实例还包含第三维,时间维度在键 Data
中,它是一个列表项,包含 n 个时间步的值。可能很明显,但我觉得我应该提到时间戳对于每个坐标实例都是相同的。
因此,总结一下,对于每个坐标,都有一个包含值的时间序列,我想将其转换为使用 numpy
的 3D 维度网格。我应该如何做?
我尝试将 data
键转换为 Pandas DataFrame:
df = pd.json_normalize(json_object["data"], record_path=["Data"], meta=["X","Y"])
这给我一个包含与给定时间步长和坐标对应的值的 DataFrame。但然后我不知道如何继续 - 如何将其转换为 3D 网格?
然后我尝试循环每个时间戳,这样我就会得到 n 个时间步的 2D 网格。但然后我不知道如何实现时间以使其成为 3 维的。
timeStamps = [t['Time'] for t in json_object['data'][0]['Data']]
dfTimestamps = {}
for i, ts in enumerate(timeStamps):
dfTimestamps[ts] = {}
X = []
Y = []
vals = []
for d in json_object['data']:
X.append(d['X'])
Y.append(d['Y'])
vals.append(d['Data'][i]['Value'])
dfTimestamps[ts]['X'] = X
dfTimestamps[ts]['Y'] = Y
dfTimestamps[ts]['Value'] = vals
编辑:
我将尝试从 JSON 文件对象中写一些示例数据。
{
'info': {
'Parameters': None,
'Unit': 'mm/hour',
'Location': 'Input geojson',
'Point': {
'IdPoints': 0,
'Name': None,
'Description': None,
'X': 571125,
'Y': 6225625,
'EPSG': None,
'Latitude': 0,
'Longitude': 0,
'CreatedDatetime': '0001-01-01T00:00:00',
'OrganizationId': None,
'ResponsibleUserId': None,
'Id': 0
},
'PointId': None,
'ParameterId': 'Rainintensity, id: 204',
'Timezone': 'UTC',
'DataSource': 'X-band Sabro, RadarId: {303}',
'EPSG': '32632',
'CreatedDateTime': '2023-07-13T13:55:18.9099301Z',
'AllPoints': False,
'dxdy': 250,
'Numx': 7,
'Numy': 6,
'X0': 0,
'Y0': 0,
'MissingSteps': 0,
'ProcessQuality': [
{
'qualityIndex': 0,
'qualityDescription': 'No problem detected',
'qualitySteps': 3241,
'missingSteps': None,
'fromUTC': '2023-07-10T00:00:00',
'toUTC': '2023-07-12T06:00:00'
}
]
},
'data': [
{
'X': 572125,
'Y': 6226875,
'Data': [
{'Time': '2023-07-10T00:00:00', 'Value': 0},
{'Time': '2023-07-10T00:01:00', 'Value': 0},
{'Time': '2023-07-10T00:02:00', 'Value': 0.259},
{'Time': '2023-07-10T00:03:00', 'Value': 0},
{'Time': '2023-07-10T00:04:00', 'Value': 0},
{'Time': '2023-07-10T00:05:00', 'Value': 0.321},
{'Time': '2023-07-10T00:06:00', 'Value': 0.279},
{'Time': '2023-07-10T00:07:00', 'Value': 0},
{'Time': '2023-07-10T00:08:00', 'Value': 0.371},
{'Time': '2023-07-10T00:09:00', 'Value': 0.399},
{'Time': '2023-07-10T00:10:00', 'Value': 0.345}
# ...
{'Time': '2023-07-10T16:37:00', 'Value': 0.299},
{'Time': '2023-07-10T16:38:00', 'Value': 0},
{'Time': '2023-07-10T16:39:00', 'Value': 0}
# ...
]
},
{
'X': 572125,
'Y': 6226875,
'Data': [
{'Time': '2023-07-10T00:00:00', 'Value': 0},
{'Time': '2023-07-10T00:01:00', 'Value': 0},
{'Time': '2023-07-10T00:02:00', 'Value': 0},
{'Time': '2023-07-10T00:03:00', 'Value':
<details>
<summary>英文:</summary>
I have been struggling with a Python related problem (or challenge) for quite some time now. And now I believe I should get some help or hints from you folks.
In the [attached file](https://aarhusvand-my.sharepoint.com/:u:/g/personal/anl_aarhusvand_dk/ESEC8xbP2OJCpLobZTRcGt8B09b_GpHzxcRJU2na1rAaJg?e=CVa0Tt) you will find a JSON file object with the data I'm working with. I'm loading in the data using the following lines of code:
```python
with open('json_data.json','r') as openfile:
json_object = json.load(openfile)
Now, when looking into json_object
, you will notice a lot of relevant information, such as the number of cells in a 2D dimension, Numx
and Numy
. Now, the key/parameter I am interested in is the data
key, i.e. json_object['data']
. This parameter, which is a list item, holds Numx
by Numy
dictionary instances, holding X and Y coordinates for each. Each instance also holds the 3rd dimension, the time dimension in the key Data
, which is a list item, holding n timesteps of values. It may be obvious, but I feel I should mention that the timestamps are the same for each coordinate instance.
So, to summarize, for each coordinate, there is a timeseries with values, which I would like to convert into a 3D dimensional grid using numpy
. How would I do that?
I've tried to convert the data
key into a Pandas DataFrame with this:
df = pd.json_normalize(json_object["data"], record_path =["Data"],meta=["X","Y"])
This gives me a DataFrame with values corresponding to the given timestep and coordinate.
But then I don't know how to continue - how do I turn this into a 3D grid?
Then I tried to loop over each timestamp, so I would have n timesteps of 2D grids. But then I struggle how to implement the time in order to make it 3-dimensional.
timeStamps = [t['Time'] for t in data_json['data'][0]['Data']]
dfTimestamps = {}
for i,ts in enumerate(timeStamps):
dfTimestamps[ts] = {}
X = []
Y = []
vals = []
for d in data_json['data']:
X.append(d['X'])
Y.append(d['Y'])
vals.append(d['Data'][i]['Value'])
dfTimestamps[ts]['X'] = X
dfTimestamps[ts]['Y'] = Y
dfTimestamps[ts]['Value'] = vals
EDIT:
I will try to write som example data from the JSON fileobject below.
{'info': {'Parameters': None,
'Unit': 'mm/hour',
'Location': 'Input geojson',
'Point': {'IdPoints': 0,
'Name': None,
'Description': None,
'X': 571125,
'Y': 6225625,
'EPSG': None,
'Latitude': 0,
'Longitude': 0,
'CreatedDatetime': '0001-01-01T00:00:00',
'OrganizationId': None,
'ResponsibleUserId': None,
'Id': 0},
'PointId': None,
'ParameterId': 'Rainintensity, id: 204',
'Timezone': 'UTC',
'DataSource': 'X-band Sabro, RadarId: {303}',
'EPSG': '32632',
'CreatedDateTime': '2023-07-13T13:55:18.9099301Z',
'AllPoints': False,
'dxdy': 250,
'Numx': 7,
'Numy': 6,
'X0': 0,
'Y0': 0,
'MissingSteps': 0,
'ProcessQuality': [{'qualityIndex': 0,
'qualityDescription': 'No problem detected',
'qualitySteps': 3241,
'missingSteps': None,
'fromUTC': '2023-07-10T00:00:00',
'toUTC': '2023-07-12T06:00:00'}]},
'data': [{'X': 572125,
'Y': 6226875,
'Data': [{'Time': '2023-07-10T00:00:00', 'Value': 0},
{'Time': '2023-07-10T00:01:00', 'Value': 0},
{'Time': '2023-07-10T00:02:00', 'Value': 0},
{'Time': '2023-07-10T00:03:00', 'Value': 0},
{'Time': '2023-07-10T00:04:00', 'Value': 0},
{'Time': '2023-07-10T00:05:00', 'Value': 0},
{'Time': '2023-07-10T00:06:00', 'Value': 0},
{'Time': '2023-07-10T00:07:00', 'Value': 0},
{'Time': '2023-07-10T00:08:00', 'Value': 0.399},
{'Time': '2023-07-10T00:09:00', 'Value': 0},
{'Time': '2023-07-10T00:10:00', 'Value': 0},
{'Time': '2023-07-10T00:11:00', 'Value': 0},
...
{'Time': '2023-07-10T16:37:00', 'Value': 0.299},
{'Time': '2023-07-10T16:38:00', 'Value': 0},
{'Time': '2023-07-10T16:39:00', 'Value': 0},
...]},
{'X': 572125,
'Y': 6226875,
'Data': [{'Time': '2023-07-10T00:00:00', 'Value': 0},
{'Time': '2023-07-10T00:01:00', 'Value': 0},
{'Time': '2023-07-10T00:02:00', 'Value': 0},
{'Time': '2023-07-10T00:03:00', 'Value': 0},
{'Time': '2023-07-10T00:04:00', 'Value': 0},
{'Time': '2023-07-10T00:05:00', 'Value': 0},
{'Time': '2023-07-10T00:06:00', 'Value': 0},
{'Time': '2023-07-10T00:07:00', 'Value': 0},
{'Time': '2023-07-10T00:08:00', 'Value': 0.399},
{'Time': '2023-07-10T00:09:00', 'Value': 0},
{'Time': '2023-07-10T00:10:00', 'Value': 0},
...
{'Time': '2023-07-10T16:37:00', 'Value': 0.299},
{'Time': '2023-07-10T16:38:00', 'Value': 0},
{'Time': '2023-07-10T16:39:00', 'Value': 0},
...]},
{'X': 571125,
'Y': 6226125,
'Data': [{'Time': '2023-07-10T00:00:00', 'Value': 0},
{'Time': '2023-07-10T00:01:00', 'Value': 0},
{'Time': '2023-07-10T00:02:00', 'Value': 0.259},
{'Time': '2023-07-10T00:03:00', 'Value': 0},
{'Time': '2023-07-10T00:04:00', 'Value': 0},
{'Time': '2023-07-10T00:05:00', 'Value': 0.321},
{'Time': '2023-07-10T00:06:00', 'Value': 0.279},
{'Time': '2023-07-10T00:07:00', 'Value': 0},
{'Time': '2023-07-10T00:08:00', 'Value': 0.371},
{'Time': '2023-07-10T00:09:00', 'Value': 0.399},
{'Time': '2023-07-10T00:10:00', 'Value': 0.345},
...
So, I have in the beginning of my file, the info
key that holds relevant information. The key data
holds the information that I want to convert into a 3D grid. For each coordinate, there is time series instances as seen above, where I have included examples of 3 coordinate instances.
答案1
得分: 0
您的尝试
df = pd.json_normalize(json_object["data"], record_path =["Data"],meta=["X","Y"])
基本上是正确的。然后您提出了以下问题:
如何将此转化为三维网格?
也许您的意思是一个二维值数组,但由于许多因素,包括您的数据中有许多重复项,且数据的网格覆盖不规则,这会变得复杂:
import json
import pandas as pd
with open('json_data.json') as file:
data = json.load(file)['data']
df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)
df = (
df.set_index(['Y', 'X'])
.Value.groupby(level=['Y', 'X'])
.mean()
.unstack(level='X')
)
如果您实际上想要三个索引(X/Y/Time),同样的问题仍然存在:
import json
import pandas as pd
with open('json_data.json') as file:
data = json.load(file)['data']
df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)
df = (
df.set_index(['Y', 'X', 'Time'])
.Value.groupby(level=['Y', 'X', 'Time'])
.mean()
.unstack(level='X')
)
根据您的评论,您实际上想要:
- 一个填充NaN为零的数据框;
- 时间作为外部索引,然后是Y;
- X作为内部(列)索引;
- 使用
first()
方法消除重复项。
这意味着:
import json
import pandas as pd
with open('json_data.json') as file:
data = json.load(file)['data']
df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)
Z = (
df.set_index(['Time', 'Y', 'X'])
.Value.groupby(level=['Time', 'Y', 'X'])
.first()
.unstack(level='X', fill_value=0)
)
不要使用循环。
英文:
Your attempt
df = pd.json_normalize(json_object["data"], record_path =["Data"],meta=["X","Y"])
is basically correct. You then ask:
> how do I turn this into a 3D grid?
Perhaps you mean a 2D array of values, but this is complicated by many factors, including that you have a lot of duplicates and there is irregular grid coverage in your data:
import json
import pandas as pd
with open('json_data.json') as file:
data = json.load(file)['data']
df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)
df = (
df.set_index(['Y', 'X'])
.Value.groupby(level=['Y', 'X'])
.mean()
.unstack(level='X')
)
If you actually want three indices (X/Y/Time), the same problems persist:
import json
import pandas as pd
with open('json_data.json') as file:
data = json.load(file)['data']
df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)
df = (
df.set_index(['Y', 'X', 'Time'])
.Value.groupby(level=['Y', 'X', 'Time'])
.mean()
.unstack(level='X')
)
Based on your comments, what you actually want is
- a frame where NaN are filled by zero;
- an outer index of Time, then Y;
- an inner (column) index of X; and
- duplicates eliminated by
first()
.
This means
import json
import pandas as pd
with open('json_data.json') as file:
data = json.load(file)['data']
df = pd.json_normalize(data=data, record_path='Data', meta=['X', 'Y'])
df['Time'] = pd.to_datetime(df.Time)
Z = (
df.set_index(['Time', 'Y', 'X'])
.Value.groupby(level=['Time', 'Y', 'X'])
.first()
.unstack(level='X', fill_value=0)
)
Don't loop.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论