使用jq将未知的JSON统一到已知的模式

huangapple go评论62阅读模式
英文:

Use jq to unify unknown JSON to known schema

问题

我不太确定如何表达这个问题,所以让我举个例子:我有大量文件的两种JSON文档格式。除了一个对象之外,大多数内容对我来说都不相关。我想创建每个文件的规范化版本。以下是我关心的两个对象(在每种格式中):

{
    "title": "一些数据",
    "data": [
        {
            "id": "123",
            ...
        },
        {
            "id": "abc",
            ...
        }
    ]
}


{
    "title": "更多一些数据",
    "data": [
        {
            "ids": [
                {
                    "id": "123",
                    ...
                },
                {
                    "id": "abc",
                    ...
                }
            ],
            "names": [
                {
                    "name": "A",
                    ...
                },
                {
                    "name": "B",
                    ...
                }
            ]
        }
    ]
}

每个“对象格式”都是文件中JSON数组内的一个对象。我想将我拥有的每个文件转换为一个对象列表,其中捕获了`title`,`id`列表和`name`列表在一个单独的对象中:

{
    "title": "更多一些数据",
    "ids": [
        "123",
        "abc"
    ],
    "names": [
        "A",
        "B"
    ]
}

我使用以下的`jq`,但它不起作用(它会为每个`name`或`id`创建具有相同标题的多个对象):

for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '...
| if type == "object" then
    if has("data") then {
        "name": .title,
        "ids": (.data[] | [
            if has("id") then {
                "id": .id
            } else if has("ids") then {
                "ids": .ids[],
                "names": .names?
            } else null end
        ])
    } else null end
else null end
| select(type != "null")' > "$f" ; done

编辑:https://jqplay.org/s/uWC80Qoixxd。
英文:

I'm not really sure on how to phrase this question so lemme give an example: I have two types of JSON document formats for a large amount of files. Most of the contents apart from one object is irrelevant to me. I want to create a normalised version of each file. These are the two objects I care about (in each of the formats):

{
    "title": "Some data",
    "data": [
        {
            "id": "123",
            ...
        },
        {
            "id": "abc",
            ...
        }
    ]
}

and

{
  "title": "Some more data",
  "data": [
    {
      "ids": [
        {
          "id": "123",
          ...
        },
        {
          "id": "abc",
          ...
        }
      ],
      "names": [
        {
          "name": "A",
          ...
        },
        {
          "name": "B",
          ...
        }
      ]
    }
  ]
}

Each of those "object formats" is an object inside a JSON array in a file. I want to convert each of the files I have into a list of objects that captures the title, list of id and list of name in a single object:

{
  "title": "Some more data",
  "ids": [
      "123",
      "abc"
  ],
  "names": [
      "A",
      "B"
  ]
}

I use the following jq, but it doesn't work (it creates multiple objects with the same title per name or id:

for f in $(find * -wholename "*.json" | sort); do
cat $f | jq '..
| if type == "object" then
    if has("data") then {
        "name": .title,
        "ids": (.data[] | [
            if has("id") then {
                "id": .id
            } else if has("ids") then {
                "ids": .ids[],
                "names": .names?
            } else null end
        end
    ])} else null end
else null end
| select(type != "null")' > "$f" ; done

EDIT: https://jqplay.org/s/uWC80Qoixxd.

答案1

得分: 1

以下是翻译好的部分:

你可以使用 .[] 迭代外部数组,然后使用 ? // 构造对象,以提供替代项,如果一个项为 null

如果你允许在键的完全缺失的情况下使用 null(就像在你的第一个格式中的 .name 一样),请尝试以下操作:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id),
  names: map(.names[]? // . | .name)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    null,
    null
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

但你也可以使用 values 来过滤掉 null

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

如果你想要完全删除空数组的键,请使用 map_values 在比较中使用 select 来过滤它们:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示


使用修改后的输入文件进行编辑:由于深层级使用相同的(相对)路径(在这里是 .specs[].spec),我们需要其他区分标准来排除具有“你不关心的一些标题”的级别。检查是否存在 .data 键似乎适用于新的示例数据。

.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
  ids: map(.ids[]?.id // .i | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

演示

英文:

You could iterate over the outer array using .[], then construct the objects using ? // to provide alternatives if one evaluates to null.

If you are okay with nulls in the comlpete absence of a key (as with .name in your first format), try this:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id),
  names: map(.names[]? // . | .name)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    null,
    null
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

Demo

But you could also filter out nulls using values:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
})
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

Demo

If you want to get rid of keys with empty arrays altogether, filter them out using map_values on a comparison using select:

.[] | {title} + (.data | {
  ids: map(.ids[]? // . | .id | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

Demo


Edit using the modified input files: As the deeper levels use the same (relative) path (here .specs[].spec), we need some other distinction criteria to rule out the level with "Some title you don't care about". Checking for the presence of a .data key seems to fit with the new sample data.

.specs[].spec | select(has("data")), .specs[]?.spec
| {title} + (.data | {
  ids: map(.ids[]?.id // .i | values),
  names: map(.names[]? // . | .name | values)
} | map_values(select(. != [])))
{
  "title": "Some data",
  "ids": [
    "123",
    "abc"
  ]
}
{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

Demo

答案2

得分: 1

Output 1:

{
  "title": "一些更多的数据",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}

Output 2:

{
  "title": "一些更多的数据",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}
英文:

If you are okay with having names: null or names: [] in the final document for your first example, the following looks like a simple solution:

{ title }
+ (.data | {
    ids: map(.ids[].id),
    names: (map(.names[].name)? // []) # or // null
})

or equivalent:

{
    title,
    ids: (.data | map(.ids[].id)),
    names: (.data | map(.names[].name)? // [])
}

Output 1:

{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": []
}

Output 2:

{
  "title": "Some more data",
  "ids": [
    "123",
    "abc"
  ],
  "names": [
    "A",
    "B"
  ]
}

huangapple
  • 本文由 发表于 2023年6月5日 16:15:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76404598.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定