英文:
Json to avro in python
问题
以下是您提供的代码的翻译部分:
我正在尝试使用以下代码将JSON转换为Avro:
from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema
avro_objects = (to_rec_avro_destructive(rec) for rec in all_obj[:100])
with open('json_in_avro.avro', 'wb') as f_out:
writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)
它能够正常工作,但是all_obj
中的JSON架构从这样:
{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {'Cost': None,
'Area': None},
'PropertyDetail': []}
}
变成了这样:
{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {'_': {'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {'_': {'Cost': None,
'Area': None}},
'PropertyDetail': {'_': []}}}
}
看起来为什么会创建这些'_'
项呢?
感谢您的帮助!
<details>
<summary>英文:</summary>
I am trying to convert json into avro using the following code:
from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema
avro_objects = (to_rec_avro_destructive(rec) for rec in all_obj[:100])
with open('json_in_avro.avro', 'wb') as f_out:
writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)
It works fine however the json schema within all_obj goes from this
{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {'Cost': None,
'Area': None},
'PropertyDetail': []}
}
To this:
{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {'_': {'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {'_': {'Cost': None,
'Area': None}},
'PropertyDetail': {'_': []}}
}
Any reason why it seems to create these '_' items?
Thank you for the help!
</details>
# 答案1
**得分**: 1
```python
# 下划线(`_`)的内容来自于 `rec_avro` 库。然而,只要您定义了数据的模式(您应该这样做),就不需要使用该库。
我不知道您模式的所有细节,但我尽力猜测,并提供了一个非常接近您所需的脚本:
```python
import fastavro
schema = {
"name": "YourName",
"type": "record",
"fields": [
{"name": "Date", "type": "string"},
{"name": "Url", "type": "string"},
{
"name": "Item",
"type": {
"type": "record",
"name": "YourItem",
"fields": [
{"name": "Title", "type": "string"},
{"name": "Id", "type": ["null", "string"]},
{"name": "DedupId", "type": ["null", "string"]},
{"name": "ImageHash", "type": ["null", "string"]},
{
"name": "Attributes",
"type": {
"type": "record",
"name": "YourAttributes",
"fields": [
{"name": "Cost", "type": ["null", "string"]},
{"name": "Area", "type": ["null", "string"]},
],
},
},
{"name": "PropertyDetail", "type": {"type": "array", "items": "string"}},
],
},
},
]
}
records = [
{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {
'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {
'Cost': None,
'Area': None,
},
'PropertyDetail': [],
}
}
]
with open('json_in_avro.avro', 'wb') as fp:
fastavro.writer(fp, schema, records)
<details>
<summary>英文:</summary>
The `_` things come from the `rec_avro` library. However, you don't need to use that library as long as you define the schema for your data (which you should do).
I don't know all the details of your schema, but I took my best guess and here's a script that should be pretty close to what you are looking for:
```python
import fastavro
schema = {
"name": "YourName",
"type": "record",
"fields": [
{"name": "Date", "type": "string"},
{"name": "Url", "type": "string"},
{
"name": "Item",
"type": {
"type": "record",
"name": "YourItem",
"fields": [
{"name": "Title", "type": "string"},
{"name": "Id", "type": ["null", "string"]},
{"name": "DedupId", "type": ["null", "string"]},
{"name": "ImageHash", "type": ["null", "string"]},
{
"name": "Attributes",
"type": {
"type": "record",
"name": "YourAttributes",
"fields": [
{"name": "Cost", "type": ["null", "string"]},
{"name": "Area", "type": ["null", "string"]},
],
},
},
{"name": "PropertyDetail", "type": {"type": "array", "items": "string"}},
],
},
},
]
}
records = [
{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {
'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {
'Cost': None,
'Area': None,
},
'PropertyDetail': [],
}
}
]
with open('json_in_avro.avro', 'wb') as fp:
fastavro.writer(fp, schema, records)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论