从包含JSON的列中选择值的方法

huangapple go评论58阅读模式
英文:

How to select value from CSV containing JSON as a column

问题

我有一个奇怪的CSV文件,其中一列是相当大的JSON数据,另一列是一个名称。为了清晰起见,以下是正确的JSON格式,但实际文件是扁平的,没有换行字符。

name,
{
    "field1":
    {"name":"foo",
    "email":"test@gmail.com"},
"field2":{
     "subfield":{
          "subfield2":{
               "active":1,
               "passive":11,
               "running":111
           }
      }
   }
}

没有漂亮格式的输入...

name,{"field1":{"name":"foo","email":"test@gmail.com"},"field2":{"subfield":{"subfield2":{"active":1,"passive":11,"running":111}}}}

我正在尝试获取名称以及所有subfield 2的唯一值到输出(最好是CSV)。

name, 
active passive running

问题在于字段数量不是固定的,subfield的名称也不是固定的。我尝试过使用jq、Miller和sed/awk,但由于它结合了一个巨大的JSON块和CSV数据,所以运行不太顺利。


<details>
<summary>英文:</summary>

I have a weird CSV with one column being JSON data of quite a large size and one is a name. Showing proper JSON formatting for clarity, but the actual file is flat with no newline chars.


name,
{
"field1":
{"name":"foo",
"email":"test@gmail.com"},
"field2":{
"subfield":{
"subfield2":{
"active":1,
"passive":11,
"running":111
}
}
}
}


The input without pretty formatting…

```txt
name,{&quot;field1&quot;:{&quot;name&quot;:&quot;foo&quot;,&quot;email&quot;:&quot;test@gmail.com&quot;},&quot;field2&quot;:{&quot;subfield&quot;:{&quot;subfield2&quot;:{&quot;active&quot;:1,&quot;passive&quot;:11,&quot;running&quot;:111}}}}

I am trying to get the name and all unique values for the subfield 2 into a output (ideally CSV).

name, 
active passive running

The issue is that the number of fields is not constant and the names of the subfields are not constant either. I have tried using jq, Miller, and sed/awk without much luck since it combines a huge JSON blob and CSV data.

答案1

得分: 2

如果您在示例输入中运行以下命令:

&lt;input.txt sed &#39;/name,/d&#39; | jq -cr &#39;.field2.subfield.subfield2 | keys[]&#39; | paste -s -d &#39; &#39; | mlr --csv --implicit-csv-header then label name

您会得到以下结果:

name
active passive running

您写的示例输出是错误的,因为如果它是一个字段的 CSV,那么在 "name" 后面不应该有逗号,也没有其他字段:

name, 
active passive running

但我可能没有理解您想要的内容。

英文:

If you run in your sample input

&lt;input.txt sed &#39;/name,/d&#39; | jq -cr &#39;.field2.subfield.subfield2 | keys[]&#39; | paste -s -d &#39; &#39; | mlr --csv --implicit-csv-header then label name

you get

name
active passive running

The sample output you write, is wrong, because if it's a one field CSV, you do not have the , after name, there are no other fields

name, 
active passive running

But I probably didn't understand what you want

答案2

得分: 1

你可以使用你喜欢的CSV到TSV转换工具将CSV转换为TSV,然后像这样通过jq运行它:

jq -rR '
  split("\t") 
  | [[.0], 
     ( .[1]|fromjson|.field2.subfield.subfield2 | keys_unsorted|join(" "))] 
  | @csv'
英文:

You could use your favorite CSV-to-TSV translator to convert the CSV to TSV and then run it through jq like so:

jq -rR &#39;
  split(&quot;\t&quot;) 
  | [[.0], 
     ( .[1]|fromjson|.field2.subfield.subfield2 | keys_unsorted|join(&quot; &quot;))] 
  | @csv&#39;

答案3

得分: -2

我用sed花了一些时间,使用以下命令得到了我需要的内容:

sed -e 's/.*subfield2\(.*\)}}}.*//' input.txt
英文:

I spent some time with sed and it got me what I needed using the below:

sed -e &#39;s/.*subfield2\(.*\)}}}.*//&#39; input.txt

huangapple
  • 本文由 发表于 2023年6月15日 05:03:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477520.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定