英文:
Use jq to unify unknown JSON to known schema
问题
我不太确定如何表达这个问题,所以让我举个例子:我有大量文件的两种JSON文档格式。除了一个对象之外,大多数内容对我来说都不相关。我想创建每个文件的规范化版本。以下是我关心的两个对象(在每种格式中):
{
    "title": "一些数据",
    "data": [
        {
            "id": "123",
            ...
        },
        {
            "id": "abc",
            ...
        }
    ]
}
和
{
    "title": "更多一些数据",
    "data": [
        {
            "ids": [
                {
                    "id": "123",
                    ...
                },
                {
                    "id": "abc",
                    ...
                }
            ],
            "names": [
                {
                    "name": "A",
                    ...
                },
                {
                    "name": "B",
                    ...
                }
            ]
        }
    ]
}
每个“对象格式”都是文件中JSON数组内的一个对象。我想将我拥有的每个文件转换为一个对象列表,其中捕获了`title`,`id`列表和`name`列表在一个单独的对象中:
{
    "title": "更多一些数据",
    "ids": [
        "123",
        "abc"
    ],
    "names": [
        "A",
        "B"
    ]
}
我使用以下的`jq`,但它不起作用(它会为每个`name`或`id`创建具有相同标题的多个对象):
for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '...
| if type == "object" then
    if has("data") then {
        "name": .title,
        "ids": (.data[] | [
            if has("id") then {
                "id": .id
            } else if has("ids") then {
                "ids": .ids[],
                "names": .names?
            } else null end
        ])
    } else null end
else null end
| select(type != "null")' > "$f" ; done
编辑:https://jqplay.org/s/uWC80Qoixxd。
英文:
I'm not really sure on how to phrase this question so lemme give an example: I have two types of JSON document formats for a large amount of files. Most of the contents apart from one object is irrelevant to me. I want to create a normalised version of each file. These are the two objects I care about (in each of the formats):
{
    "title": "Some data",
    "data": [
        {
            "id": "123",
            ...
        },
        {
            "id": "abc",
            ...
        }
    ]
}
and
{
  "title": "Some more data",
  "data": [
    {
      "ids": [
        {
          "id": "123",
          ...
        },
        {
          "id": "abc",
          ...
        }
      ],
      "names": [
        {
          "name": "A",
          ...
        },
        {
          "name": "B",
          ...
        }
      ]
    }
  ]
}
Each of those "object formats" is an object inside a JSON array in a file. I want to convert each of the files I have into a list of objects that captures the title, list of id and list of name in a single object:
{
  "title": "Some more data",
  "ids": [
      "123",
      "abc"
  ],
  "names": [
      "A",
      "B"
  ]
}
I use the following jq, but it doesn't work (it creates multiple objects with the same title per name or id:
for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '..
| if type == "object" then
    if has("data") then {
        "name": .title,
        "ids": (.data[] | [
            if has("id") then {
                "id": .id
            } else if has("ids") then {
                "ids": .ids[],
                "names": .names?
            } else null end
        end
    ])} else null end
else null end
| select(type != "null")' > "$f" ; done
答案1
得分: 1
以下是翻译好的部分:
你可以使用 .[] 迭代外部数组,然后使用 ? // 构造对象,以提供替代项,如果一个项为 null。
如果你允许在键的完全缺失的情况下使用 null(就像在你的第一个格式中的 .name 一样),请尝试以下操作:
.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id),
  names: map(.names[]? // . | .name)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    null,
    null
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
但你也可以使用 values 来过滤掉 null:
.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
如果你想要完全删除空数组的键,请使用 map_values 在比较中使用 select 来过滤它们:
.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
使用修改后的输入文件进行编辑:由于深层级使用相同的(相对)路径(在这里是 .specs[].spec),我们需要其他区分标准来排除具有“你不关心的一些标题”的级别。检查是否存在 .data 键似乎适用于新的示例数据。
.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
  ids: map(.ids[]?.id // .i | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
英文:
You could iterate over the outer array using .[], then construct the objects using ? // to provide alternatives if one evaluates to null.
If you are okay with nulls in the comlpete absence of a key (as with .name in your first format), try this:
.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id),
  names: map(.names[]? // . | .name)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    null,
    null
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
But you could also filter out nulls using values:
.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
If you want to get rid of keys with empty arrays altogether, filter them out using map_values on a comparison using select:
.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
Edit using the modified input files: As the deeper levels use the same (relative) path (here .specs[].spec), we need some other distinction criteria to rule out the level with "Some title you don't care about". Checking for the presence of a .data key seems to fit with the new sample data.
.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
  ids: map(.ids[]?.id // .i | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
答案2
得分: 1
Output 1:
{
  "title": "一些更多的数据",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
Output 2:
{
  "title": "一些更多的数据",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
英文:
If you are okay with having names: null or names: [] in the final document for your first example, the following looks like a simple solution:
{ title }
+ (.data | {
    ids: map(.ids[].id),
    names: (map(.names[].name)? // []) # or // null
})
or equivalent:
{
    title,
    ids: (.data | map(.ids[].id)),
    names: (.data | map(.names[].name)? // [])
}
Output 1:
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
Output 2:
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论