英文:
Parsing JSON response from API using DataFlow
问题
如何将API的JSON响应解析为CSV?
目前,API响应存储在Blob Storage中,我现在想要将其解析并保存为CSV文件。
这是JSON响应的一个子集示例,
{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:dimension2",
"ga:dimension1",
"ga:dimension4",
"ga:dimension6"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "ga:newUsers",
"type": "INTEGER"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"Australia",
"Hong Kong",
"HKNG",
"not set"
],
"metrics": [
{
"values": [
"1"
]
}
]
},
{
"dimensions": [
"Australia",
"Malaysia",
"KL",
"not set"
],
"metrics": [
{
"values": [
"1"
]
}
]
}
]
}
}
]
}
所需的output.csv文件应该具有JSON文件的data
键的列标题。
列标题应该如下所示,
+-----------+------------+------------+------------+-------------+
|dimension1 | dimension2 | dimension4 | dimension6 | metric_value|
+-----------+------------+------------+------------+-------------+
| Australia | Hong Kong | HKNG | not set | 1 |
| ... | ... | .... | ... | ... |
+-----------+------------+------------+------------+-------------+
英文:
How do I parse a JSON response from API as CSV?
Currently the API response is stored as a JSON file on Blob Storage and I now want to parse and save it as CSV file.
Here's a subset of the JSON response as sample,
{
"reports": [
{
"columnHeader": {
"dimensions": [
"ga:dimension2",
"ga:dimension1",
"ga:dimension4",
"ga:dimension6"
],
"metricHeader": {
"metricHeaderEntries": [
{
"name": "ga:newUsers",
"type": "INTEGER"
}
]
}
},
"data": {
"rows": [
{
"dimensions": [
"Australia",
"Hong Kong",
"HKNG",
"not set"
],
"metrics": [
{
"values": [
"1"
]
}
]
},
{
"dimensions": [
"Australia",
"Malaysia",
"KL",
"not set"
],
"metrics": [
{
"values": [
"1"
]
}
]
}
]
}
}
]
}
the desired output.csv file should have column headers from the data
key of JSON file.
the column header would ideally be like this,
+-----------+------------+------------+------------+-------------+
|dimension1 | dimension2 | dimension4 | dimension6 | metric_value|
+-----------+------------+------------+------------+-------------+
| Australia | Hong Kong | HKNG | not set | 1 |
| ... | ... | .... | ... | ... |
+-----------+------------+------------+------------+-------------+
答案1
得分: 1
解析来自API的JSON响应并将其保存为Azure Data Factory (ADF) Data Flow中的CSV文件,您可以按照以下步骤操作:
- 在数据流中使用“源”转换,并将Blob存储中的JSON文件作为源数据集。将
documentForm
选项设置为arrayOfDocuments
。这将指示JSON文件包含一组对象。
- 使用选择转换,将
reports
数组中第二个对象的data
字段映射到名为data
的新列中。
data = reports[1].data
- 使用展开转换,展开
data
列中metrics
字段中的values
数组。将dimensions
和values
字段映射到新的列中。
- 然后使用推导转换基于
dimensions
和values
列创建新列。在此处创建名为dimensions2
、dimensions1
、dimensions4
、dimensions6
和metric_values
的新列。
-
使用选择转换仅选择
dimensions2
、dimensions1
、dimensions4
、dimensions6
和metric_values
字段。 -
然后使用CSV数据集的接收转换。
输出:
运行带有此数据流的管道时,JSON数据将以所需格式复制到CSV文件中。
英文:
To parse a JSON response from an API and save it as a CSV file in Azure Data Factory (ADF) Data Flow, you can use the following steps:
- Take the Source transformation in dataflow and take the JSON file from Blob Storage as the source dataset. Set the
documentForm
option toarrayOfDocuments
. This will indicate that the JSON file contains an array of objects.
- Take the Select transformation and map the
data
field of the second object in thereports
array to a new column calleddata
.
data = reports[1].data
-
Take the Flatten transformation and unroll the
values
array in themetrics
field of thedata
column. Map thedimensions
andvalues
fields ato new columns.
-
Then take the Derive transformation to create new columns based on the
dimensions
andvalues
columns. New columns calleddimensions2
,dimensions1
,dimensions4
,dimensions6
, andmetric_values
are created here.
-
Take the Select transformation is to select
dimensions2
,dimensions1
,dimensions4
,dimensions6
, andmetric_values
fields only. -
Then take the sink transformation with csv dataset.
Output:
When the pipeline with this dataflow is run, the Json data will be copied to csv file in required format.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论