2023年2月8日 22:56:16go评论68阅读模式

英文:

Merge two array of objects with common key using jq command

问题

这可以通过Shell脚本和jq命令来实现。以下是一个示例Shell脚本，它使用jq命令来合并两个数据集：

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

# 使用jq命令将两个数据集合并成一个新的数据集
combined=$(jq -n --argjson data1 "$data1" --argjson data2 "$data2" '($data1 + $data2) | group_by(.timestamp) | map({errors: .[0].errors, bookings: .[1].bookings, timestamp: .[0].timestamp})')

echo "combined='$combined'"

这个脚本中，首先定义了两个数据集data1和data2。然后，使用jq命令将它们合并成一个新的数据集combined。在合并过程中，通过timestamp字段将两个数据集中的条目关联在一起，以确保它们具有相同的timestamp值。最后，脚本将合并后的结果打印出来。

注意：在实际使用中，你可以将这个脚本保存为一个.sh文件，然后通过运行./your_script.sh来执行它。

英文:

I have two datasets:

data1=&#39;[
    { &quot;bookings&quot;: 2984, &quot;timestamp&quot;: 1675854900 },
    { &quot;bookings&quot;: 2967, &quot;timestamp&quot;: 1675855200 }
]&#39;

data2=&#39;[
    { &quot;errors&quot;: 51, &quot;timestamp&quot;: 1675854900 },
    { &quot;errors&quot;: 90, &quot;timestamp&quot;: 1675855200 }
]&#39;

I want the output to be:

combined=&#39;[
    { &quot;errors&quot;: 51, bookings: 2984, &quot;timestamp&quot;: 1675854900 },
    { &quot;errors&quot;: 90, bookings: 2967, &quot;timestamp&quot;: 1675855200 }
]&#39;

Can this be achieved by shell scripting and jq command?

Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.

答案1

得分: 1

A simple JOIN operation could do:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'

[
  {
    "errors": 51,
    "timestamp": 1675854900,
    "bookings": 2984
  },
  {
    "errors": 90,
    "timestamp": 1675855200,
    "bookings": 2967
  }
]

I'm getting this error: jq: error: JOIN/4 is not defined at <top-level>, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error

You are probably using an older version of jq. JOIN and INDEX were introduced in jq 1.6. Either define them yourself by taking their definitions from source, or take those definitions and modify them to fit your use case (both work well with jq 1.5).

Definitions from source:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  def INDEX(stream; idx_expr):
    reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
  def JOIN($idx; stream; idx_expr; join_expr):
    stream | [., $idx[idx_expr]] | join_expr;

  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
'

Adapted to your use case:

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  ($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
  | $data2 | map(. + $ix[.timestamp | @text])
'

英文:

A simple JOIN operation could do:

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
&#39;

[
  {
    &quot;errors&quot;: 51,
    &quot;timestamp&quot;: 1675854900,
    &quot;bookings&quot;: 2984
  },
  {
    &quot;errors&quot;: 90,
    &quot;timestamp&quot;: 1675855200,
    &quot;bookings&quot;: 2967
  }
]

> I'm getting this error: jq: error: JOIN/4 is not defined at <top-level>, line 2: [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)] jq: 1 compile error

Definitions from source:

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  def INDEX(stream; idx_expr):
    reduce stream as $row ({}; .[$row | idx_expr | tostring] = $row);
  def JOIN($idx; stream; idx_expr; join_expr):
    stream | [., $idx[idx_expr]] | join_expr;

  [JOIN(INDEX($data1[]; .timestamp); $data2[]; .timestamp | @text; add)]
&#39;

Adapted to your use case:

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  ($data1 | with_entries(.key = (.value.timestamp | @text))) as $ix
  | $data2 | map(. + $ix[.timestamp | @text])
&#39;

答案2

得分: 1

一般来说，如果您觉得JOIN有点难以理解或使用，那么考虑在解决这种问题时使用INDEX。在这种情况下，您可以采用一个非常简单的方法，例如：

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  INDEX($data1[]; .timestamp) as $dict
  | $data2 | map( . + $dict[.timestamp|tostring])

英文:

In general, if you find JOIN a bit tricky to understand or use, then consider using INDEX for this type of problem. In the present case, you could get away with a trivially simple approach, e.g.:

jq -n --argjson data1 &quot;$data1&quot;  --argjson data2 &quot;$data2&quot; &#39;
  INDEX($data1[]; .timestamp) as $dict
  | $data2 | map( . + $dict[.timestamp|tostring])

答案3

得分: 1

jq -n --argjson data1 "$data1" --argjson data2 "$data2" '
  $data1 | [keys[] | $data2[.] + $data1[.]]
'

[
  {
    "errors": 51,
    "timestamp": 1675854900,
    "bookings": 2984
  },
  {
    "errors": 90,
    "timestamp": 1675855200,
    "bookings": 2967
  }
]

英文:

This last paragraph just caught my attention:

> Assume that timestamp will always be present and will always have a common value across two datasets. Even the order is same.

If this is truly the case then it is reasonable to assume that both arrays have the same length and their items are aligned respectively. Thus, there's no need to build up a hash-based INDEX as accessing the items by their numeric keys (positions within the arrays) can already be achieved in constant time.

jq -n --argjson data1 &quot;$data1&quot; --argjson data2 &quot;$data2&quot; &#39;
  $data1 | [keys[] | $data2[.] + $data1[.]]
&#39;

[
  {
    &quot;errors&quot;: 51,
    &quot;timestamp&quot;: 1675854900,
    &quot;bookings&quot;: 2984
  },
  {
    &quot;errors&quot;: 90,
    &quot;timestamp&quot;: 1675855200,
    &quot;bookings&quot;: 2967
  }
]

答案4

得分: 0

另一种方法是构建一个从时间戳到错误计数的映射，并在其中进行查找。

  input as $data1
| input as $data2
| ($data2
   | map({ "key": (.timestamp | tostring), "value": .errors }) 
   | from_entries
  ) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
&#39; &lt;&lt;&lt;"$data1 $data2"

英文:

Another way to do this is to build a map from timestamps to error counts, and perform a lookup in it.

jq -n &#39;
  input as $data1
| input as $data2
| ($data2
   | map({ &quot;key&quot;: (.timestamp | tostring), &quot;value&quot;: .errors }) 
   | from_entries
  ) as $errors_by_timestamp
| $data1 | map(.errors = $errors_by_timestamp[(.timestamp | tostring)])
&#39; &lt;&lt;&lt;&quot;$data1 $data2&quot;

答案5

得分: 0

顺便说一下，我从早上开始尝试这个答案，最后它也给了我正确的解决方案

#!/bin/bash

data1='[
    { "bookings": 2984, "timestamp": 1675854900 },
    { "bookings": 2967, "timestamp": 1675855200 }
]'

data2='[
    { "errors": 51, "timestamp": 1675854900 },
    { "errors": 90, "timestamp": 1675855200 }
]'

combined=$(jq -n --argjson d1 "$data1" --argjson d2 "$data2" '
  [ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
    reduce .[] as $i ({}; . * $i)
  )
')

echo "$combined"

英文:

By the way, I have trying to this answer from AI since morning and finally it also gave me correct solution this time

#!/bin/bash

data1=&#39;[
    { &quot;bookings&quot;: 2984, &quot;timestamp&quot;: 1675854900 },
    { &quot;bookings&quot;: 2967, &quot;timestamp&quot;: 1675855200 }
]&#39;

data2=&#39;[
    { &quot;errors&quot;: 51, &quot;timestamp&quot;: 1675854900 },
    { &quot;errors&quot;: 90, &quot;timestamp&quot;: 1675855200 }
]&#39;

combined=$(jq -n --argjson d1 &quot;$data1&quot; --argjson d2 &quot;$data2&quot; &#39;
  [ $d1, $d2 ] | transpose[] | group_by(.timestamp) | map(
    reduce .[] as $i ({}; . * $i)
  )
&#39;)

echo &quot;$combined&quot;

Just pasting it here for you guys in case you didn't think of this method

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用jq命令合并具有相同键的两个对象数组

问题

答案1

答案2

答案3

答案4

答案5

go 1.5 : Is "go install" behaviour changed? Removing stale executables?

合并两个YAML文件深度合并在Bash中使用yq。

为什么bash会“忘记”我的后台进程？

获取bash数组中除了给定索引以外的所有元素。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论