2023年6月5日 16:15:56go评论62阅读模式

英文:

Use jq to unify unknown JSON to known schema

问题

我不太确定如何表达这个问题，所以让我举个例子：我有大量文件的两种JSON文档格式。除了一个对象之外，大多数内容对我来说都不相关。我想创建每个文件的规范化版本。以下是我关心的两个对象（在每种格式中）：

{
    "title": "一些数据",
    "data": [
        {
            "id": "123",
            ...
        },
        {
            "id": "abc",
            ...
        }
    ]
}

和

{
    "title": "更多一些数据",
    "data": [
        {
            "ids": [
                {
                    "id": "123",
                    ...
                },
                {
                    "id": "abc",
                    ...
                }
            ],
            "names": [
                {
                    "name": "A",
                    ...
                },
                {
                    "name": "B",
                    ...
                }
            ]
        }
    ]
}

每个“对象格式”都是文件中JSON数组内的一个对象。我想将我拥有的每个文件转换为一个对象列表，其中捕获了`title`，`id`列表和`name`列表在一个单独的对象中：

{
    "title": "更多一些数据",
    "ids": [
        "123",
        "abc"
    ],
    "names": [
        "A",
        "B"
    ]
}

我使用以下的`jq`，但它不起作用（它会为每个`name`或`id`创建具有相同标题的多个对象）：

for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '...
| if type == "object" then
    if has("data") then {
        "name": .title,
        "ids": (.data[] | [
            if has("id") then {
                "id": .id
            } else if has("ids") then {
                "ids": .ids[],
                "names": .names?
            } else null end
        ])
    } else null end
else null end
| select(type != "null")' > "$f" ; done

编辑：https://jqplay.org/s/uWC80Qoixxd。

英文:

I'm not really sure on how to phrase this question so lemme give an example: I have two types of JSON document formats for a large amount of files. Most of the contents apart from one object is irrelevant to me. I want to create a normalised version of each file. These are the two objects I care about (in each of the formats):

{
    &quot;title&quot;: &quot;Some data&quot;,
    &quot;data&quot;: [
        {
            &quot;id&quot;: &quot;123&quot;,
            ...
        },
        {
            &quot;id&quot;: &quot;abc&quot;,
            ...
        }
    ]
}

and

{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;data&quot;: [
    {
      &quot;ids&quot;: [
        {
          &quot;id&quot;: &quot;123&quot;,
          ...
        },
        {
          &quot;id&quot;: &quot;abc&quot;,
          ...
        }
      ],
      &quot;names&quot;: [
        {
          &quot;name&quot;: &quot;A&quot;,
          ...
        },
        {
          &quot;name&quot;: &quot;B&quot;,
          ...
        }
      ]
    }
  ]
}

Each of those "object formats" is an object inside a JSON array in a file. I want to convert each of the files I have into a list of objects that captures the title, list of id and list of name in a single object:

{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
      &quot;123&quot;,
      &quot;abc&quot;
  ],
  &quot;names&quot;: [
      &quot;A&quot;,
      &quot;B&quot;
  ]
}

I use the following jq, but it doesn't work (it creates multiple objects with the same title per name or id:

for f in $(find * -wholename &quot;*.json&quot; | sort); do
cat $f | jq &#39;..
| if type == &quot;object&quot; then
    if has(&quot;data&quot;) then {
        &quot;name&quot;: .title,
        &quot;ids&quot;: (.data[] | [
            if has(&quot;id&quot;) then {
                &quot;id&quot;: .id
            } else if has(&quot;ids&quot;) then {
                &quot;ids&quot;: .ids[],
                &quot;names&quot;: .names?
            } else null end
        end
    ])} else null end
else null end
| select(type != &quot;null&quot;)&#39; &gt; &quot;$f&quot; ; done

EDIT: https://jqplay.org/s/uWC80Qoixxd.

答案1

得分: 1

以下是翻译好的部分：

你可以使用 .[] 迭代外部数组，然后使用 ? // 构造对象，以提供替代项，如果一个项为 null。

如果你允许在键的完全缺失的情况下使用 null（就像在你的第一个格式中的 .name 一样），请尝试以下操作：

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id),
  names: map(.names[]? // . | .name)
})

{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    null,
    null
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

但你也可以使用 values 来过滤掉 null：

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
})

{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

如果你想要完全删除空数组的键，请使用 map_values 在比较中使用 select 来过滤它们：

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))

{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

使用修改后的输入文件进行编辑：由于深层级使用相同的（相对）路径（在这里是 .specs[].spec），我们需要其他区分标准来排除具有“你不关心的一些标题”的级别。检查是否存在 .data 键似乎适用于新的示例数据。

.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
  ids: map(.ids[]?.id // .i | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))

{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

英文:

You could iterate over the outer array using .[], then construct the objects using ? // to provide alternatives if one evaluates to null.

If you are okay with nulls in the comlpete absence of a key (as with .name in your first format), try this:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id),
  names: map(.names[]? // . | .name)
})

{
  &quot;title&quot;: &quot;Some data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: [
    null,
    null
  ]
}
{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: [
    &quot;A&quot;,
    &quot;B&quot;
  ]
}

Demo

But you could also filter out nulls using values:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
})

{
  &quot;title&quot;: &quot;Some data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: []
}
{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: [
    &quot;A&quot;,
    &quot;B&quot;
  ]
}

Demo

If you want to get rid of keys with empty arrays altogether, filter them out using map_values on a comparison using select:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))

{
  &quot;title&quot;: &quot;Some data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ]
}
{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: [
    &quot;A&quot;,
    &quot;B&quot;
  ]
}

Demo

Edit using the modified input files: As the deeper levels use the same (relative) path (here .specs[].spec), we need some other distinction criteria to rule out the level with "Some title you don't care about". Checking for the presence of a .data key seems to fit with the new sample data.

.specs[].spec | select(has(&quot;data&quot;)), .specs[]?.spec
| {title} + (.data | {
  ids: map(.ids[]?.id // .i | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))

{
  &quot;title&quot;: &quot;Some data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ]
}
{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: [
    &quot;A&quot;,
    &quot;B&quot;
  ]
}

Demo

答案2

得分: 1

Output 1:

{
  "title": "一些更多的数据",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}

Output 2:

{
  "title": "一些更多的数据",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

英文:

If you are okay with having names: null or names: [] in the final document for your first example, the following looks like a simple solution:

{ title }
+ (.data | {
    ids: map(.ids[].id),
    names: (map(.names[].name)? // []) # or // null
})

or equivalent:

{
    title,
    ids: (.data | map(.ids[].id)),
    names: (.data | map(.names[].name)? // [])
}

Output 1:

{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: []
}

Output 2:

{
  &quot;title&quot;: &quot;Some more data&quot;,
  &quot;ids&quot;: [
    &quot;123&quot;,
    &quot;abc&quot;
  ],
  &quot;names&quot;: [
    &quot;A&quot;,
    &quot;B&quot;
  ]
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用jq将未知的JSON统一到已知的模式

问题

答案1

答案2

将数据追加到 JSON 中，并将其作为 map 在 GOLANG 中获取回来。

创建一个用于我的Python游戏的JSON保存系统。

Golang net/http 请求的 Body 总是为空。

C# System.Text.Json 在响应格式因请求中的标志而更改时进行反序列化

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论