英文:
Use jq to unify unknown JSON to known schema
问题
我不太确定如何表达这个问题,所以让我举个例子:我有大量文件的两种JSON文档格式。除了一个对象之外,大多数内容对我来说都不相关。我想创建每个文件的规范化版本。以下是我关心的两个对象(在每种格式中):
{
"title": "一些数据",
"data": [
{
"id": "123",
...
},
{
"id": "abc",
...
}
]
}
和
{
"title": "更多一些数据",
"data": [
{
"ids": [
{
"id": "123",
...
},
{
"id": "abc",
...
}
],
"names": [
{
"name": "A",
...
},
{
"name": "B",
...
}
]
}
]
}
每个“对象格式”都是文件中JSON数组内的一个对象。我想将我拥有的每个文件转换为一个对象列表,其中捕获了`title`,`id`列表和`name`列表在一个单独的对象中:
{
"title": "更多一些数据",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
我使用以下的`jq`,但它不起作用(它会为每个`name`或`id`创建具有相同标题的多个对象):
for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '...
| if type == "object" then
if has("data") then {
"name": .title,
"ids": (.data[] | [
if has("id") then {
"id": .id
} else if has("ids") then {
"ids": .ids[],
"names": .names?
} else null end
])
} else null end
else null end
| select(type != "null")' > "$f" ; done
编辑:https://jqplay.org/s/uWC80Qoixxd。
英文:
I'm not really sure on how to phrase this question so lemme give an example: I have two types of JSON document formats for a large amount of files. Most of the contents apart from one object is irrelevant to me. I want to create a normalised version of each file. These are the two objects I care about (in each of the formats):
{
"title": "Some data",
"data": [
{
"id": "123",
...
},
{
"id": "abc",
...
}
]
}
and
{
"title": "Some more data",
"data": [
{
"ids": [
{
"id": "123",
...
},
{
"id": "abc",
...
}
],
"names": [
{
"name": "A",
...
},
{
"name": "B",
...
}
]
}
]
}
Each of those "object formats" is an object inside a JSON array in a file. I want to convert each of the files I have into a list of objects that captures the title
, list of id
and list of name
in a single object:
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
I use the following jq
, but it doesn't work (it creates multiple objects with the same title per name
or id
:
for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '..
| if type == "object" then
if has("data") then {
"name": .title,
"ids": (.data[] | [
if has("id") then {
"id": .id
} else if has("ids") then {
"ids": .ids[],
"names": .names?
} else null end
end
])} else null end
else null end
| select(type != "null")' > "$f" ; done
答案1
得分: 1
以下是翻译好的部分:
你可以使用 .[]
迭代外部数组,然后使用 ? //
构造对象,以提供替代项,如果一个项为 null
。
如果你允许在键的完全缺失的情况下使用 null
(就像在你的第一个格式中的 .name
一样),请尝试以下操作:
.[] | {title} + (.data | {
ids: map(.ids[]? // . | .id),
names: map(.names[]? // . | .name)
})
{
"title": "Some data",
"ids": [
"123",
"abc"
],
"names": [
null,
null
]
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
但你也可以使用 values
来过滤掉 null
:
.[] | {title} + (.data | {
ids: map(.ids[]? // . | .id | values),
names: map(.names[]? // . | .name | values)
})
{
"title": "Some data",
"ids": [
"123",
"abc"
],
"names": []
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
如果你想要完全删除空数组的键,请使用 map_values
在比较中使用 select
来过滤它们:
.[] | {title} + (.data | {
ids: map(.ids[]? // . | .id | values),
names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
"title": "Some data",
"ids": [
"123",
"abc"
]
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
使用修改后的输入文件进行编辑:由于深层级使用相同的(相对)路径(在这里是 .specs[].spec
),我们需要其他区分标准来排除具有“你不关心的一些标题”的级别。检查是否存在 .data
键似乎适用于新的示例数据。
.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
ids: map(.ids[]?.id // .i | values),
names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
"title": "Some data",
"ids": [
"123",
"abc"
]
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
英文:
You could iterate over the outer array using .[]
, then construct the objects using ? //
to provide alternatives if one evaluates to null
.
If you are okay with null
s in the comlpete absence of a key (as with .name
in your first format), try this:
.[] | {title} + (.data | {
ids: map(.ids[]? // . | .id),
names: map(.names[]? // . | .name)
})
{
"title": "Some data",
"ids": [
"123",
"abc"
],
"names": [
null,
null
]
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
But you could also filter out null
s using values
:
.[] | {title} + (.data | {
ids: map(.ids[]? // . | .id | values),
names: map(.names[]? // . | .name | values)
})
{
"title": "Some data",
"ids": [
"123",
"abc"
],
"names": []
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
If you want to get rid of keys with empty arrays altogether, filter them out using map_values
on a comparison using select
:
.[] | {title} + (.data | {
ids: map(.ids[]? // . | .id | values),
names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
"title": "Some data",
"ids": [
"123",
"abc"
]
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
Edit using the modified input files: As the deeper levels use the same (relative) path (here .specs[].spec
), we need some other distinction criteria to rule out the level with "Some title you don't care about". Checking for the presence of a .data
key seems to fit with the new sample data.
.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
ids: map(.ids[]?.id // .i | values),
names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
"title": "Some data",
"ids": [
"123",
"abc"
]
}
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
答案2
得分: 1
Output 1:
{
"title": "一些更多的数据",
"ids": [
"123",
"abc"
],
"names": []
}
Output 2:
{
"title": "一些更多的数据",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
英文:
If you are okay with having names: null
or names: []
in the final document for your first example, the following looks like a simple solution:
{ title }
+ (.data | {
ids: map(.ids[].id),
names: (map(.names[].name)? // []) # or // null
})
or equivalent:
{
title,
ids: (.data | map(.ids[].id)),
names: (.data | map(.names[].name)? // [])
}
Output 1:
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": []
}
Output 2:
{
"title": "Some more data",
"ids": [
"123",
"abc"
],
"names": [
"A",
"B"
]
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论