使用jq命令合并具有相同键的两个对象数组

huangapple go评论68阅读模式
英文:

Merge two array of objects with common key using jq command

问题

这可以通过Shell脚本和jq命令来实现。以下是一个示例Shell脚本,它使用jq命令来合并两个数据集:

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

# 使用jq命令将两个数据集合并成一个新的数据集
combined=$(jq -n --argjson data1 "$data1" --argjson data2 "$data2" '($data1 + $data2) | group_by(.timestamp) | map({errors: .[0].errors, bookings: .[1].bookings, timestamp: .[0].timestamp})')

echo "combined='$combined'"

这个脚本中,首先定义了两个数据集data1和data2。然后,使用jq命令将它们合并成一个新的数据集combined。在合并过程中,通过timestamp字段将两个数据集中的条目关联在一起,以确保它们具有相同的timestamp值。最后,脚本将合并后的结果打印出来。

注意:在实际使用中,你可以将这个脚本保存为一个.sh文件,然后通过运行./your_script.sh来执行它。

英文:

I have two datasets:

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

I want the output to be:

combined='[
    { "errors": 51, bookings: 2984, "timestamp": 1675854900 },
    { "errors": 90, bookings: 2967, "timestamp": 1675855200 }
]'

Can this be achieved by shell scripting and jq command?

Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.

答案1

得分: 1

A simple JOIN operation could do:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'
[
  {
    "errors": 51,
    "timestamp": 1675854900,
    "bookings": 2984
  },
  {
    "errors": 90,
    "timestamp": 1675855200,
    "bookings": 2967
  }
]

I'm getting this error: jq: error: JOIN/4 is not defined at <top-level>, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error

You are probably using an older version of jq. JOIN and INDEX were introduced in jq 1.6. Either define them yourself by taking their definitions from source, or take those definitions and modify them to fit your use case (both work well with jq 1.5).

Definitions from source:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  def INDEX(stream; idx_expr):
    reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
  def JOIN($idx; stream; idx_expr; join_expr):
    stream | [., $idx[idx_expr]] | join_expr;

  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'

Adapted to your use case:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  ($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
  | $data2 | map(. + $ix[.timestamp | @text])
'
英文:

A simple JOIN operation could do:

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
&#39;
[
  {
    &quot;errors&quot;: 51,
    &quot;timestamp&quot;: 1675854900,
    &quot;bookings&quot;: 2984
  },
  {
    &quot;errors&quot;: 90,
    &quot;timestamp&quot;: 1675855200,
    &quot;bookings&quot;: 2967
  }
]

> I'm getting this error: jq: error: JOIN/4 is not defined at &lt;top-level&gt;, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error

You are probably using an older version of jq. JOIN and INDEX were introduced in jq 1.6. Either define them yourself by taking their definitions from source, or take those definitions and modify them to fit your very use case (both work well with jq 1.5).

Definitions from source:

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  def INDEX(stream; idx_expr):
    reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
  def JOIN($idx; stream; idx_expr; join_expr):
    stream | [., $idx[idx_expr]] | join_expr;

  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
&#39;

Adapted to your use case:

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  ($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
  | $data2 | map(. + $ix[.timestamp | @text])
&#39;

答案2

得分: 1

一般来说,如果您觉得JOIN有点难以理解或使用,那么考虑在解决这种问题时使用INDEX。在这种情况下,您可以采用一个非常简单的方法,例如:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  INDEX($data1[]; .timestamp) as $dict
  | $data2 | map( . + $dict[.timestamp|tostring])
英文:

In general, if you find JOIN a bit tricky to understand or use, then consider using INDEX for this type of problem. In the present case, you could get away with a trivially simple approach, e.g.:

jq -n --argjson data1 &quot;$data1&quot;  --argjson data2 &quot;$data2&quot; &#39;
  INDEX($data1[]; .timestamp) as $dict
  | $data2 | map( . + $dict[.timestamp|tostring])

答案3

得分: 1

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  $data1 | [keys[] | $data2[.] + $data1[.]]
'
[
  {
    "errors": 51,
    "timestamp": 1675854900,
    "bookings": 2984
  },
  {
    "errors": 90,
    "timestamp": 1675855200,
    "bookings": 2967
  }
]
英文:

This last paragraph just caught my attention:

> Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.

If this is truly the case then it is reasonable to assume that both arrays have the same length and their items are aligned respectively. Thus, there's no need to build up a hash-based INDEX as accessing the items by their numeric keys (positions within the arrays) can already be achieved in constant time.

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  $data1 | [keys[] | $data2[.] + $data1[.]]
&#39;
[
  {
    &quot;errors&quot;: 51,
    &quot;timestamp&quot;: 1675854900,
    &quot;bookings&quot;: 2984
  },
  {
    &quot;errors&quot;: 90,
    &quot;timestamp&quot;: 1675855200,
    &quot;bookings&quot;: 2967
  }
]

答案4

得分: 0

另一种方法是构建一个从时间戳到错误计数的映射,并在其中进行查找。

  input as $data1
| input as $data2
| ($data2
   | map({ "key": (.timestamp | tostring), "value": .errors }) 
   | from_entries
  ) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
&#39; &lt;&lt;&lt;"$data1 $data2"
英文:

Another way to do this is to build a map from timestamps to error counts, and perform a lookup in it.

jq -n &#39;
  input as $data1
| input as $data2
| ($data2
   | map({ &quot;key&quot;: (.timestamp | tostring), &quot;value&quot;: .errors }) 
   | from_entries
  ) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
&#39; &lt;&lt;&lt;&quot;$data1 $data2&quot;

答案5

得分: 0

顺便说一下,我从早上开始尝试这个答案,最后它也给了我正确的解决方案

#!/bin/bash

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

combined=$(jq -n --argjson d1 "$data1" --argjson d2 "$data2" '
  [ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
    reduce .[] as $i ({}; . * $i)
  )
')

echo "$combined"
英文:

By the way, I have trying to this answer from AI since morning and finally it also gave me correct solution this time

#!/bin/bash

data1=&#39;[
    { &quot;bookings&quot;: 2984, &quot;timestamp&quot;: 1675854900 },
    { &quot;bookings&quot;: 2967, &quot;timestamp&quot;: 1675855200 }
]&#39;

data2=&#39;[
    { &quot;errors&quot;: 51, &quot;timestamp&quot;: 1675854900 },
    { &quot;errors&quot;: 90, &quot;timestamp&quot;: 1675855200 }
]&#39;

combined=$(jq -n --argjson d1 &quot;$data1&quot; --argjson d2 &quot;$data2&quot; &#39;
  [ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
    reduce .[] as $i ({}; . * $i)
  )
&#39;)

echo &quot;$combined&quot;

Just pasting it here for you guys in case you didn't think of this method

huangapple
  • 本文由 发表于 2023年2月8日 22:56:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75387570.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定