为什么在使用 pandas 规范化 JSON 时,访问嵌套元数据会得到 NaN?

huangapple go评论114阅读模式
英文:

Why reaching nested meta gives NaN when normalizing a json with pandas?

问题

以下是您要翻译的内容:

  1. 我的输入是一个Python字典(类似JSON):
  2. d = {
  3. "type": "type1",
  4. "details": {
  5. "name": "foo",
  6. "date": {
  7. "timestamp": "01/02/2023 21:42:44",
  8. "components": {
  9. "day": 2,
  10. "month": 1,
  11. "year": 2023,
  12. "time": "21:42:44"
  13. }
  14. }
  15. },
  16. "infos": {
  17. "records": [
  18. {
  19. "field1": "qux",
  20. "field2": "baz",
  21. }
  22. ],
  23. "class": "P"
  24. }
  25. }
  26. 我使用以下代码:
  27. df = pd.json_normalize(
  28. d,
  29. record_path=["infos", "records"],
  30. meta=[
  31. "type",
  32. ["details", "date", "timestamp"],
  33. ["details", "date", "components", "year"],
  34. ["infos", "class"]
  35. ],
  36. errors="ignore"
  37. )
  38. 这给了我以下输出:
  39. field1 field2 type details.date.timestamp details.date.components.year infos.class
  40. 0 qux baz type1 NaN NaN P
  41. 但我期望得到这个输出:
  42. field1 field2 type details.date.timestamp details.date.components.year infos.class
  43. 0 qux baz type1 01/02/2023 21:42:44 2023 P
  44. 老实说,我对`meta`参数感到非常困惑!我不知道我做错了什么...
  45. 您能解释一下它的逻辑吗?
英文:

My input is a Python dictionnary (json-like) :

  1. d = {
  2. "type": "type1",
  3. "details": {
  4. "name": "foo",
  5. "date": {
  6. "timestamp": "01/02/2023 21:42:44",
  7. "components": {
  8. "day": 2,
  9. "month": 1,
  10. "year": 2023,
  11. "time": "21:42:44"
  12. }
  13. }
  14. },
  15. "infos": {
  16. "records": [
  17. {
  18. "field1": "qux",
  19. "field2": "baz",
  20. }
  21. ],
  22. "class": "P"
  23. }
  24. }

I'm using the code below :

  1. df = pd.json_normalize(
  2. d,
  3. record_path=["infos", "records"],
  4. meta=[
  5. "type",
  6. ["details", "date", "timestamp"],
  7. ["details", "date", "components", "year"],
  8. ["infos", "class"]
  9. ],
  10. errors="ignore"
  11. )

Which gives me this output :

  1. field1 field2 type details.date.timestamp details.date.components.year infos.class
  2. 0 qux baz type1 NaN NaN P

But I'm expecting this one :

  1. field1 field2 type details.date.timestamp details.date.components.year infos.class
  2. 0 qux baz type1 01/02/2023 21:42:44 2023 P

To be honest, I'm going crazy with the meta parameter! I ignore what I'm doing wrong..

Can you explain its logic, please ?

答案1

得分: 2

  1. 我认为你应该在`record_path=`中额外添加`[]`
  2. ```py
  3. df = pd.json_normalize(
  4. d,
  5. record_path=[["infos", "records"]], # <-- 在这里加上 []
  6. meta=[
  7. "type",
  8. ["details", "date", "timestamp"],
  9. ["details", "date", "components", "year"],
  10. ["infos", "class"],
  11. ],
  12. errors="ignore",
  13. )
  14. print(df)

打印:

  1. field1 field2 type details.date.timestamp details.date.components.year infos.class
  2. 0 qux baz type1 01/02/2023 21:42:44 2023 P
  1. <details>
  2. <summary>英文:</summary>
  3. I think you should put extra `[]` in `record_path=`:
  4. ```py
  5. df = pd.json_normalize(
  6. d,
  7. record_path=[[&quot;infos&quot;, &quot;records&quot;]], # &lt;-- put [] here
  8. meta=[
  9. &quot;type&quot;,
  10. [&quot;details&quot;, &quot;date&quot;, &quot;timestamp&quot;],
  11. [&quot;details&quot;, &quot;date&quot;, &quot;components&quot;, &quot;year&quot;],
  12. [&quot;infos&quot;, &quot;class&quot;],
  13. ],
  14. errors=&quot;ignore&quot;,
  15. )
  16. print(df)

Prints:

  1. field1 field2 type details.date.timestamp details.date.components.year infos.class
  2. 0 qux baz type1 01/02/2023 21:42:44 2023 P

huangapple
  • 本文由 发表于 2023年7月24日 15:05:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76752107.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定