pandas的`merge`在多级索引连接时会丢弃级别

huangapple go评论96阅读模式
英文:

pandas' `merge` drops levels on multiindex joins

问题

I don't understand why merge call below dropped the zero level l0 while join didn't? I don't see that behavior described in the docs. Any explanation?

  1. import string
  2. import pandas as pd
  3. alph = string.ascii_lowercase
  4. n=5
  5. inds = pd.MultiIndex.from_tuples([(i,j) for i in alph[:n] for j in range(1,n)])
  6. t = pd.DataFrame(data=np.random.randint(0,10, len(inds)), index=inds).sort_index()
  7. t.index.names=['l0', 'l1']
  8. t2 = pd.DataFrame(data = [222,333], index=[2,3])
  9. t2.index.names = ['l1']
  10. display(t, t2)
  11. display(t.merge(t2,how='left', on='l1'))
  12. display(t.join(t2,how='left', on='l1', lsuffix='_x', rsuffix='_y'))
英文:

I don't understand why merge call below dropped the zero level l0 while join didn't? I don't see that behavior described in the docs. Any explanation?

  1. import string
  2. import pandas as pd
  3. alph = string.ascii_lowercase
  4. n=5
  5. inds = pd.MultiIndex.from_tuples([(i,j) for i in alph[:n] for j in range(1,n)])
  6. t = pd.DataFrame(data=np.random.randint(0,10, len(inds)), index=inds).sort_index()
  7. t.index.names=['l0', 'l1']
  8. t2 = pd.DataFrame(data = [222,333], index=[2,3])
  9. t2.index.names = ['l1']
  10. display(t, t2)
  11. display(t.merge(t2,how='left', on='l1'))
  12. display(t.join(t2,how='left', on='l1', lsuffix='_x', rsuffix='_y'))

答案1

得分: 1

你可以在 merge 调用中使用 set_index() 并给它传递 tindex 来保留你的 l0 索引:

  1. t.merge(t2, how='left', on='l1').set_index(t.index)

我运行代码时打印了上面的内容,得到了以下结果:

  1. 0_x 0_y
  2. l0 l1
  3. a 1 5 NaN
  4. 2 7 222.0
  5. 3 3 333.0
  6. 4 5 NaN
  7. b 1 4 NaN
  8. 2 2 222.0
  9. 3 0 333.0
  10. 4 8 NaN
  11. c 1 8 NaN
  12. 2 2 222.0
  13. 3 9 333.0
  14. 4 9 NaN
  15. d 1 3 NaN
  16. 2 4 222.0
  17. 3 2 333.0
  18. 4 1 NaN
  19. e 1 7 NaN
  20. 2 1 222.0
  21. 3 2 333.0
  22. 4 0 NaN
英文:

You can use set_index() in your merge call and give it the index of t to preserve your l0 index:

  1. t.merge(t2,how='left', on='l1').set_index(t.index)

Printing the above when I ran the code gave me the following:

  1. 0_x 0_y
  2. l0 l1
  3. a 1 5 NaN
  4. 2 7 222.0
  5. 3 3 333.0
  6. 4 5 NaN
  7. b 1 4 NaN
  8. 2 2 222.0
  9. 3 0 333.0
  10. 4 8 NaN
  11. c 1 8 NaN
  12. 2 2 222.0
  13. 3 9 333.0
  14. 4 9 NaN
  15. d 1 3 NaN
  16. 2 4 222.0
  17. 3 2 333.0
  18. 4 1 NaN
  19. e 1 7 NaN
  20. 2 1 222.0
  21. 3 2 333.0
  22. 4 0 NaN

huangapple
  • 本文由 发表于 2023年4月17日 09:07:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76031100.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定