英文:
How to select value from CSV containing JSON as a column
问题
我有一个奇怪的CSV文件,其中一列是相当大的JSON数据,另一列是一个名称。为了清晰起见,以下是正确的JSON格式,但实际文件是扁平的,没有换行字符。
name,
{
"field1":
{"name":"foo",
"email":"test@gmail.com"},
"field2":{
"subfield":{
"subfield2":{
"active":1,
"passive":11,
"running":111
}
}
}
}
没有漂亮格式的输入...
name,{"field1":{"name":"foo","email":"test@gmail.com"},"field2":{"subfield":{"subfield2":{"active":1,"passive":11,"running":111}}}}
我正在尝试获取名称以及所有subfield 2的唯一值到输出(最好是CSV)。
name,
active passive running
问题在于字段数量不是固定的,subfield的名称也不是固定的。我尝试过使用jq、Miller和sed/awk,但由于它结合了一个巨大的JSON块和CSV数据,所以运行不太顺利。
<details>
<summary>英文:</summary>
I have a weird CSV with one column being JSON data of quite a large size and one is a name. Showing proper JSON formatting for clarity, but the actual file is flat with no newline chars.
name,
{
"field1":
{"name":"foo",
"email":"test@gmail.com"},
"field2":{
"subfield":{
"subfield2":{
"active":1,
"passive":11,
"running":111
}
}
}
}
The input without pretty formatting…
```txt
name,{"field1":{"name":"foo","email":"test@gmail.com"},"field2":{"subfield":{"subfield2":{"active":1,"passive":11,"running":111}}}}
I am trying to get the name and all unique values for the subfield 2 into a output (ideally CSV).
name,
active passive running
The issue is that the number of fields is not constant and the names of the subfields are not constant either. I have tried using jq, Miller, and sed/awk without much luck since it combines a huge JSON blob and CSV data.
答案1
得分: 2
如果您在示例输入中运行以下命令:
<input.txt sed '/name,/d' | jq -cr '.field2.subfield.subfield2 | keys[]' | paste -s -d ' ' | mlr --csv --implicit-csv-header then label name
您会得到以下结果:
name
active passive running
您写的示例输出是错误的,因为如果它是一个字段的 CSV,那么在 "name" 后面不应该有逗号,也没有其他字段:
name,
active passive running
但我可能没有理解您想要的内容。
英文:
If you run in your sample input
<input.txt sed '/name,/d' | jq -cr '.field2.subfield.subfield2 | keys[]' | paste -s -d ' ' | mlr --csv --implicit-csv-header then label name
you get
name
active passive running
The sample output you write, is wrong, because if it's a one field CSV, you do not have the ,
after name, there are no other fields
name,
active passive running
But I probably didn't understand what you want
答案2
得分: 1
你可以使用你喜欢的CSV到TSV转换工具将CSV转换为TSV,然后像这样通过jq运行它:
jq -rR '
split("\t")
| [[.0],
( .[1]|fromjson|.field2.subfield.subfield2 | keys_unsorted|join(" "))]
| @csv'
英文:
You could use your favorite CSV-to-TSV translator to convert the CSV to TSV and then run it through jq like so:
jq -rR '
split("\t")
| [[.0],
( .[1]|fromjson|.field2.subfield.subfield2 | keys_unsorted|join(" "))]
| @csv'
答案3
得分: -2
我用sed花了一些时间,使用以下命令得到了我需要的内容:
sed -e 's/.*subfield2\(.*\)}}}.*//' input.txt
英文:
I spent some time with sed and it got me what I needed using the below:
sed -e 's/.*subfield2\(.*\)}}}.*//' input.txt
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论