英文:
Merge two array of objects with common key using jq command
问题
这可以通过Shell脚本和jq命令来实现。以下是一个示例Shell脚本,它使用jq命令来合并两个数据集:
data1='[
{ "bookings": 2984, "timestamp": 1675854900 },
{ "bookings": 2967, "timestamp": 1675855200 }
]'
data2='[
{ "errors": 51, "timestamp": 1675854900 },
{ "errors": 90, "timestamp": 1675855200 }
]'
# 使用jq命令将两个数据集合并成一个新的数据集
combined=$(jq -n --argjson data1 "$data1" --argjson data2 "$data2" '($data1 + $data2) | group_by(.timestamp) | map({errors: .[0].errors, bookings: .[1].bookings, timestamp: .[0].timestamp})')
echo "combined='$combined'"
这个脚本中,首先定义了两个数据集data1和data2。然后,使用jq命令将它们合并成一个新的数据集combined。在合并过程中,通过timestamp字段将两个数据集中的条目关联在一起,以确保它们具有相同的timestamp值。最后,脚本将合并后的结果打印出来。
注意:在实际使用中,你可以将这个脚本保存为一个.sh文件,然后通过运行./your_script.sh来执行它。
英文:
I have two datasets:
data1='[
{ "bookings": 2984, "timestamp": 1675854900 },
{ "bookings": 2967, "timestamp": 1675855200 }
]'
data2='[
{ "errors": 51, "timestamp": 1675854900 },
{ "errors": 90, "timestamp": 1675855200 }
]'
I want the output to be:
combined='[
{ "errors": 51, bookings: 2984, "timestamp": 1675854900 },
{ "errors": 90, bookings: 2967, "timestamp": 1675855200 }
]'
Can this be achieved by shell scripting and jq command?
Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.
答案1
得分: 1
A simple JOIN
operation could do:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
[JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'
[
{
"errors": 51,
"timestamp": 1675854900,
"bookings": 2984
},
{
"errors": 90,
"timestamp": 1675855200,
"bookings": 2967
}
]
I'm getting this error:
jq: error: JOIN/4 is not defined at <top-level>, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error
You are probably using an older version of jq. JOIN
and INDEX
were introduced in jq 1.6. Either define them yourself by taking their definitions from source, or take those definitions and modify them to fit your use case (both work well with jq 1.5).
Definitions from source:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
def JOIN($idx; stream; idx_expr; join_expr):
stream | [., $idx[idx_expr]] | join_expr;
[JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'
Adapted to your use case:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
| $data2 | map(. + $ix[.timestamp | @text])
'
英文:
A simple JOIN
operation could do:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
[JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'
[
{
"errors": 51,
"timestamp": 1675854900,
"bookings": 2984
},
{
"errors": 90,
"timestamp": 1675855200,
"bookings": 2967
}
]
> I'm getting this error: jq: error: JOIN/4 is not defined at <top-level>, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error
You are probably using an older version of jq. JOIN
and INDEX
were introduced in jq 1.6. Either define them yourself by taking their definitions from source, or take those definitions and modify them to fit your very use case (both work well with jq 1.5).
Definitions from source:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
def JOIN($idx; stream; idx_expr; join_expr):
stream | [., $idx[idx_expr]] | join_expr;
[JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'
Adapted to your use case:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
| $data2 | map(. + $ix[.timestamp | @text])
'
答案2
得分: 1
一般来说,如果您觉得JOIN
有点难以理解或使用,那么考虑在解决这种问题时使用INDEX
。在这种情况下,您可以采用一个非常简单的方法,例如:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
INDEX($data1[]; .timestamp) as $dict
| $data2 | map( . + $dict[.timestamp|tostring])
英文:
In general, if you find JOIN
a bit tricky to understand or use, then consider using INDEX
for this type of problem. In the present case, you could get away with a trivially simple approach, e.g.:
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
INDEX($data1[]; .timestamp) as $dict
| $data2 | map( . + $dict[.timestamp|tostring])
答案3
得分: 1
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
$data1 | [keys[] | $data2[.] + $data1[.]]
'
[
{
"errors": 51,
"timestamp": 1675854900,
"bookings": 2984
},
{
"errors": 90,
"timestamp": 1675855200,
"bookings": 2967
}
]
英文:
This last paragraph just caught my attention:
> Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.
If this is truly the case then it is reasonable to assume that both arrays have the same length and their items are aligned respectively. Thus, there's no need to build up a hash-based INDEX
as accessing the items by their numeric keys
(positions within the arrays) can already be achieved in constant time.
jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
$data1 | [keys[] | $data2[.] + $data1[.]]
'
[
{
"errors": 51,
"timestamp": 1675854900,
"bookings": 2984
},
{
"errors": 90,
"timestamp": 1675855200,
"bookings": 2967
}
]
答案4
得分: 0
另一种方法是构建一个从时间戳到错误计数的映射,并在其中进行查找。
input as $data1
| input as $data2
| ($data2
| map({ "key": (.timestamp | tostring), "value": .errors })
| from_entries
) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
' <<<"$data1 $data2"
英文:
Another way to do this is to build a map from timestamps to error counts, and perform a lookup in it.
jq -n '
input as $data1
| input as $data2
| ($data2
| map({ "key": (.timestamp | tostring), "value": .errors })
| from_entries
) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
' <<<"$data1 $data2"
答案5
得分: 0
顺便说一下,我从早上开始尝试这个答案,最后它也给了我正确的解决方案
#!/bin/bash
data1='[
{ "bookings": 2984, "timestamp": 1675854900 },
{ "bookings": 2967, "timestamp": 1675855200 }
]'
data2='[
{ "errors": 51, "timestamp": 1675854900 },
{ "errors": 90, "timestamp": 1675855200 }
]'
combined=$(jq -n --argjson d1 "$data1" --argjson d2 "$data2" '
[ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
reduce .[] as $i ({}; . * $i)
)
')
echo "$combined"
英文:
By the way, I have trying to this answer from AI since morning and finally it also gave me correct solution this time
#!/bin/bash
data1='[
{ "bookings": 2984, "timestamp": 1675854900 },
{ "bookings": 2967, "timestamp": 1675855200 }
]'
data2='[
{ "errors": 51, "timestamp": 1675854900 },
{ "errors": 90, "timestamp": 1675855200 }
]'
combined=$(jq -n --argjson d1 "$data1" --argjson d2 "$data2" '
[ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
reduce .[] as $i ({}; . * $i)
)
')
echo "$combined"
Just pasting it here for you guys in case you didn't think of this method
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论