Json转换为Avro在Python中

huangapple go评论115阅读模式
英文:

Json to avro in python

问题

以下是您提供的代码的翻译部分:

  1. 我正在尝试使用以下代码将JSON转换为Avro
  2. from fastavro import writer, reader, schema
  3. from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema
  4. avro_objects = (to_rec_avro_destructive(rec) for rec in all_obj[:100])
  5. with open('json_in_avro.avro', 'wb') as f_out:
  6. writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)

它能够正常工作,但是all_obj 中的JSON架构从这样:

  1. {
  2. 'Date': '2023-07-16',
  3. 'Url': 'google.pt',
  4. 'Item': {'Title': 'abababab',
  5. 'Id': '28e3c5n',
  6. 'DedupId': None,
  7. 'ImageHash': None,
  8. 'Attributes': {'Cost': None,
  9. 'Area': None},
  10. 'PropertyDetail': []}
  11. }

变成了这样:

  1. {
  2. 'Date': '2023-07-16',
  3. 'Url': 'google.pt',
  4. 'Item': {'_': {'Title': 'abababab',
  5. 'Id': '28e3c5n',
  6. 'DedupId': None,
  7. 'ImageHash': None,
  8. 'Attributes': {'_': {'Cost': None,
  9. 'Area': None}},
  10. 'PropertyDetail': {'_': []}}}
  11. }

看起来为什么会创建这些'_'项呢?

感谢您的帮助!

  1. <details>
  2. <summary>英文:</summary>
  3. I am trying to convert json into avro using the following code:
  4. from fastavro import writer, reader, schema
  5. from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema
  6. avro_objects = (to_rec_avro_destructive(rec) for rec in all_obj[:100])
  7. with open(&#39;json_in_avro.avro&#39;, &#39;wb&#39;) as f_out:
  8. writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)
  9. It works fine however the json schema within all_obj goes from this
  10. {
  11. &#39;Date&#39;: &#39;2023-07-16&#39;,
  12. &#39;Url&#39;: &#39;google.pt&#39;,
  13. &#39;Item&#39;: {&#39;Title&#39;: &#39;abababab&#39;,
  14. &#39;Id&#39;: &#39;28e3c5n&#39;,
  15. &#39;DedupId&#39;: None,
  16. &#39;ImageHash&#39;: None,
  17. &#39;Attributes&#39;: {&#39;Cost&#39;: None,
  18. &#39;Area&#39;: None},
  19. &#39;PropertyDetail&#39;: []}
  20. }
  21. To this:
  22. {
  23. &#39;Date&#39;: &#39;2023-07-16&#39;,
  24. &#39;Url&#39;: &#39;google.pt&#39;,
  25. &#39;Item&#39;: {&#39;_&#39;: {&#39;Title&#39;: &#39;abababab&#39;,
  26. &#39;Id&#39;: &#39;28e3c5n&#39;,
  27. &#39;DedupId&#39;: None,
  28. &#39;ImageHash&#39;: None,
  29. &#39;Attributes&#39;: {&#39;_&#39;: {&#39;Cost&#39;: None,
  30. &#39;Area&#39;: None}},
  31. &#39;PropertyDetail&#39;: {&#39;_&#39;: []}}
  32. }
  33. Any reason why it seems to create these &#39;_&#39; items?
  34. Thank you for the help!
  35. </details>
  36. # 答案1
  37. **得分**: 1
  38. ```python
  39. # 下划线(`_`)的内容来自于 `rec_avro` 库。然而,只要您定义了数据的模式(您应该这样做),就不需要使用该库。
  40. 我不知道您模式的所有细节,但我尽力猜测,并提供了一个非常接近您所需的脚本:
  41. ```python
  42. import fastavro
  43. schema = {
  44. "name": "YourName",
  45. "type": "record",
  46. "fields": [
  47. {"name": "Date", "type": "string"},
  48. {"name": "Url", "type": "string"},
  49. {
  50. "name": "Item",
  51. "type": {
  52. "type": "record",
  53. "name": "YourItem",
  54. "fields": [
  55. {"name": "Title", "type": "string"},
  56. {"name": "Id", "type": ["null", "string"]},
  57. {"name": "DedupId", "type": ["null", "string"]},
  58. {"name": "ImageHash", "type": ["null", "string"]},
  59. {
  60. "name": "Attributes",
  61. "type": {
  62. "type": "record",
  63. "name": "YourAttributes",
  64. "fields": [
  65. {"name": "Cost", "type": ["null", "string"]},
  66. {"name": "Area", "type": ["null", "string"]},
  67. ],
  68. },
  69. },
  70. {"name": "PropertyDetail", "type": {"type": "array", "items": "string"}},
  71. ],
  72. },
  73. },
  74. ]
  75. }
  76. records = [
  77. {
  78. 'Date': '2023-07-16',
  79. 'Url': 'google.pt',
  80. 'Item': {
  81. 'Title': 'abababab',
  82. 'Id': '28e3c5n',
  83. 'DedupId': None,
  84. 'ImageHash': None,
  85. 'Attributes': {
  86. 'Cost': None,
  87. 'Area': None,
  88. },
  89. 'PropertyDetail': [],
  90. }
  91. }
  92. ]
  93. with open('json_in_avro.avro', 'wb') as fp:
  94. fastavro.writer(fp, schema, records)
  1. <details>
  2. <summary>英文:</summary>
  3. The `_` things come from the `rec_avro` library. However, you don&#39;t need to use that library as long as you define the schema for your data (which you should do).
  4. I don&#39;t know all the details of your schema, but I took my best guess and here&#39;s a script that should be pretty close to what you are looking for:
  5. ```python
  6. import fastavro
  7. schema = {
  8. &quot;name&quot;: &quot;YourName&quot;,
  9. &quot;type&quot;: &quot;record&quot;,
  10. &quot;fields&quot;: [
  11. {&quot;name&quot;: &quot;Date&quot;, &quot;type&quot;: &quot;string&quot;},
  12. {&quot;name&quot;: &quot;Url&quot;, &quot;type&quot;: &quot;string&quot;},
  13. {
  14. &quot;name&quot;: &quot;Item&quot;,
  15. &quot;type&quot;: {
  16. &quot;type&quot;: &quot;record&quot;,
  17. &quot;name&quot;: &quot;YourItem&quot;,
  18. &quot;fields&quot;: [
  19. {&quot;name&quot;: &quot;Title&quot;, &quot;type&quot;: &quot;string&quot;},
  20. {&quot;name&quot;: &quot;Id&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
  21. {&quot;name&quot;: &quot;DedupId&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
  22. {&quot;name&quot;: &quot;ImageHash&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
  23. {
  24. &quot;name&quot;: &quot;Attributes&quot;,
  25. &quot;type&quot;: {
  26. &quot;type&quot;: &quot;record&quot;,
  27. &quot;name&quot;: &quot;YourAttributes&quot;,
  28. &quot;fields&quot;: [
  29. {&quot;name&quot;: &quot;Cost&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
  30. {&quot;name&quot;: &quot;Area&quot;, &quot;type&quot;: [&quot;null&quot;, &quot;string&quot;]},
  31. ],
  32. },
  33. },
  34. {&quot;name&quot;: &quot;PropertyDetail&quot;, &quot;type&quot;: {&quot;type&quot;: &quot;array&quot;, &quot;items&quot;: &quot;string&quot;}},
  35. ],
  36. },
  37. },
  38. ]
  39. }
  40. records = [
  41. {
  42. &#39;Date&#39;: &#39;2023-07-16&#39;,
  43. &#39;Url&#39;: &#39;google.pt&#39;,
  44. &#39;Item&#39;: {
  45. &#39;Title&#39;: &#39;abababab&#39;,
  46. &#39;Id&#39;: &#39;28e3c5n&#39;,
  47. &#39;DedupId&#39;: None,
  48. &#39;ImageHash&#39;: None,
  49. &#39;Attributes&#39;: {
  50. &#39;Cost&#39;: None,
  51. &#39;Area&#39;: None,
  52. },
  53. &#39;PropertyDetail&#39;: [],
  54. }
  55. }
  56. ]
  57. with open(&#39;json_in_avro.avro&#39;, &#39;wb&#39;) as fp:
  58. fastavro.writer(fp, schema, records)

huangapple
  • 本文由 发表于 2023年7月18日 00:51:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706589.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定