Json转换为Avro在Python中

huangapple go评论71阅读模式
英文:

Json to avro in python

问题

以下是您提供的代码的翻译部分:

我正在尝试使用以下代码将JSON转换为Avro

from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema

avro_objects = (to_rec_avro_destructive(rec) for rec in all_obj[:100])

with open('json_in_avro.avro', 'wb') as f_out:
    writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)

它能够正常工作,但是all_obj 中的JSON架构从这样:

{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {'Cost': None,
  'Area': None},
'PropertyDetail':  []}
 }

变成了这样:

{
'Date': '2023-07-16',
'Url': 'google.pt',
'Item': {'_': {'Title': 'abababab',
'Id': '28e3c5n',
'DedupId': None,
'ImageHash': None,
'Attributes': {'_': {'Cost': None,
  'Area': None}},
'PropertyDetail': {'_': []}}}
 }

看起来为什么会创建这些'_'项呢?

感谢您的帮助!


<details>
<summary>英文:</summary>

I am trying to convert json into avro using the following code:

    from fastavro import writer, reader, schema
    from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema
    
    avro_objects = (to_rec_avro_destructive(rec) for rec in all_obj[:100])

    with open(&#39;json_in_avro.avro&#39;, &#39;wb&#39;) as f_out:
        writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)


It works fine however the json schema within all_obj goes from this
 

    {
    &#39;Date&#39;: &#39;2023-07-16&#39;,
    &#39;Url&#39;: &#39;google.pt&#39;,
    &#39;Item&#39;: {&#39;Title&#39;: &#39;abababab&#39;,
    &#39;Id&#39;: &#39;28e3c5n&#39;,
    &#39;DedupId&#39;: None,
    &#39;ImageHash&#39;: None,
    &#39;Attributes&#39;: {&#39;Cost&#39;: None,
      &#39;Area&#39;: None},
    &#39;PropertyDetail&#39;:  []}
     }

To this:

    {
    &#39;Date&#39;: &#39;2023-07-16&#39;,
    &#39;Url&#39;: &#39;google.pt&#39;,
    &#39;Item&#39;: {&#39;_&#39;: {&#39;Title&#39;: &#39;abababab&#39;,
    &#39;Id&#39;: &#39;28e3c5n&#39;,
    &#39;DedupId&#39;: None,
    &#39;ImageHash&#39;: None,
    &#39;Attributes&#39;: {&#39;_&#39;: {&#39;Cost&#39;: None,
      &#39;Area&#39;: None}},
    &#39;PropertyDetail&#39;: {&#39;_&#39;: []}}
     }

Any reason why it seems to create these &#39;_&#39; items?

Thank you for the help!

</details>


# 答案1
**得分**: 1

```python
# 下划线(`_`)的内容来自于 `rec_avro` 库。然而,只要您定义了数据的模式(您应该这样做),就不需要使用该库。

我不知道您模式的所有细节,但我尽力猜测,并提供了一个非常接近您所需的脚本:

```python
import fastavro

schema = {
  "name": "YourName",
  "type": "record",
  "fields": [
    {"name": "Date", "type": "string"},
    {"name": "Url", "type": "string"},
    {
      "name": "Item",
      "type": {
        "type": "record",
        "name": "YourItem",
        "fields": [
          {"name": "Title", "type": "string"},
          {"name": "Id", "type": ["null", "string"]},
          {"name": "DedupId", "type": ["null", "string"]},
          {"name": "ImageHash", "type": ["null", "string"]},
          {
           "name": "Attributes",
            "type": {
              "type": "record",
              "name": "YourAttributes",
              "fields": [
                {"name": "Cost", "type": ["null", "string"]},
                {"name": "Area", "type": ["null", "string"]},
              ],
            },
          },
          {"name": "PropertyDetail", "type": {"type": "array", "items": "string"}},
        ],
      },
    },
  ]
}

records = [
    {
        'Date': '2023-07-16',
        'Url': 'google.pt',
        'Item': {
            'Title': 'abababab',
            'Id': '28e3c5n',
            'DedupId': None,
            'ImageHash': None,
            'Attributes': {
                'Cost': None,
                'Area': None,
            },
            'PropertyDetail':  [],
        }
    }
]

with open('json_in_avro.avro', 'wb') as fp:
    fastavro.writer(fp, schema, records)

<details>
<summary>英文:</summary>

The `_` things come from the `rec_avro` library. However, you don&#39;t need to use that library as long as you define the schema for your data (which you should do).

I don&#39;t know all the details of your schema, but I took my best guess and here&#39;s a script that should be pretty close to what you are looking for:

```python
import fastavro

schema = {
  &quot;name&quot;: &quot;YourName&quot;,
  &quot;type&quot;: &quot;record&quot;,
  &quot;fields&quot;: [
    {&quot;name&quot;: &quot;Date&quot;, &quot;type&quot;: &quot;string&quot;},
    {&quot;name&quot;: &quot;Url&quot;, &quot;type&quot;: &quot;string&quot;},
    {
      &quot;name&quot;: &quot;Item&quot;,
      &quot;type&quot;: {
        &quot;type&quot;: &quot;record&quot;,
        &quot;name&quot;: &quot;YourItem&quot;,
        &quot;fields&quot;: [
          {&quot;name&quot;: &quot;Title&quot;, &quot;type&quot;: &quot;string&quot;},
          {&quot;name&quot;: &quot;Id&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
          {&quot;name&quot;: &quot;DedupId&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
          {&quot;name&quot;: &quot;ImageHash&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
          {
           &quot;name&quot;: &quot;Attributes&quot;,
            &quot;type&quot;: {
              &quot;type&quot;: &quot;record&quot;,
              &quot;name&quot;: &quot;YourAttributes&quot;,
              &quot;fields&quot;: [
                {&quot;name&quot;: &quot;Cost&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
                {&quot;name&quot;: &quot;Area&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
              ],
            },
          },
          {&quot;name&quot;: &quot;PropertyDetail&quot;, &quot;type&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: &quot;string&quot;}},
        ],
      },
    },
  ]
}

records = [
    {
        &#39;Date&#39;: &#39;2023-07-16&#39;,
        &#39;Url&#39;: &#39;google.pt&#39;,
        &#39;Item&#39;: {
            &#39;Title&#39;: &#39;abababab&#39;,
            &#39;Id&#39;: &#39;28e3c5n&#39;,
            &#39;DedupId&#39;: None,
            &#39;ImageHash&#39;: None,
            &#39;Attributes&#39;: {
                &#39;Cost&#39;: None,
                &#39;Area&#39;: None,
            },
            &#39;PropertyDetail&#39;:  [],
        }
    }
]

with open(&#39;json_in_avro.avro&#39;, &#39;wb&#39;) as fp:
    fastavro.writer(fp, schema, records)



huangapple
  • 本文由 发表于 2023年7月18日 00:51:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706589.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定